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Abstract 

Many of the applications of compressed sensing have been based on variable density sampling, 
where certain sections of the sampling coefficients are sampled more densely. Furthermore, it has 
been observed that these sampling schemes are dependent not only on sparsity but also on the 
sparsity structure of the underlying signal. This paper extends the result of (Adcock, Hansen, Poon 
and Roman, arXiv:1302.0561 2013) to the case where the sparsifying system forms a tight frame. 
By dividing the sampling coefficients into levels, our main result will describe how the amount of 
subsampling in each level is determined by the local coherences between the sampling and sparsifying 
operators and the localized level sparsities - the sparsity in each level under the sparsifying operator. 


1 Introduction 

Over the past decades, much of the research in signal processing has been based on the assumption that 
natural signals can be sparsely represented. One of the achievements resulting from this realization was 
compressed sensing, which made it possible to recover a sparse signal from very few non-adaptive linear 
measurements. Compressed sensing is typically modelled as follows. Given an unknown vector x £ C N 
and a measurement device represented by a matrix V, one aims to recover x from a highly incomplete 
set of measurements by solving 

R(x,tl) € argmin ||Z);r||^i subject to PqV z = PqVx, (1.1) 

zec N 

where U indexes the given measurements, Pq is a projection matrix which restricts a vector to its 
coefficients indexed by Q and D is a sparsifying matrix under which Dx is assumed to be sparse. Typical 
results in compressed sensing describe how under certain conditions, one can guarantee recovery when 
the number of measurements Q scales up to a log factor linearly with sparsity |5J[^:7]. 

A large part of the theoretical development of compressed sensing has revolved around the construc¬ 
tion of random sampling matrices (such as matrices constructed from random Gaussian ensembles) where 
the choice of the samples is completely independent of the sparsifying system [TO EH IMl US] • The use of 
overcomplete dictionaries in compressed sensing has also been studied in works such as mmm , but 
again, recovery guarantees were obtained only for randomised sampling matrices or subsampled struc¬ 
tured matrices with randomised column signs. However, in the majority of applications where compressed 
sensing has been of interest, one is concerned with the recovery of a signal from structured measurements, 
without the possibility of first randomising the underlying signal. For example, the measurements in 
magnetic resonance imaging (MRI) are modelled via the Fourier transform, while the measurements in 
radio interferometry are modelled via the Radon transform. In these cases, how one can achieve subsam¬ 
pling is highly dependent on the sparsifying transform. To explain this statement, we recall some results 
of compressed sensing on the recovery of a vector of length N from its discrete Fourier coefficients under 
various sparsifying transforms. 
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(1) If the underlying vector is s-sparse in its canonical basis, then one can guarantee perfect recovery 
from O (slogiV) Fourier coefficients drawn uniformly at random $3]. 

(2) If the underlying vector is s-sparse with respect to its total variation [5J, then O (slogTV) Fourier 
coefficients drawn uniformly at random will again guarantee perfect recovery, however, in the pres¬ 
ence of noise and approximate sparsity, then one can obtain superior error bounds with sampling 
strategies which sample more densely at low frequency coefficients instead 134] . 

(3) If the underlying vector is s-sparse with respect to some wavelet basis, then it is impossible to 
guarantee recovery from O (s log N ) samples from sampling uniformly at random. This is a phe¬ 
nomenon which has been observed since the early days of compressed sensing and there has been 
extensive investigations into how subsampling is still achievable by sampling more densely at low 
frequencies [33j [31] HU |42j [35]. These approaches were often referred to as variable density sampling 
and theoretical guarantees for these approaches were recently derived in [j29] and [2j. 

More generally, whether one can sample uniformly at random depends on whether the sampling and 
sparsifying matrices are sufficiently incoherent. In the absence of incoherence (as is the case in (3) 
above), how one should choose f l in <0> becomes a far more delicate issue. To explain the use of 
compressed sensing in this case, a theoretical framework was developed in [2] on the basis of three 
new principles: multilevel sampling, asymptotic incoherence and asymptotic sparsity. By modelling a 
nonuniform sampling strategies via multilevel sampling, the need for dense sampling at low frequencies 
in (3) is due to the following two reasons. 

(i) The high correspondence between Fourier and wavelet bases at low Fourier frequencies and low 
wavelet scales, but the low correspondence at high Fourier frequencies and high wavelet scales 
(asymptotic incoherence). 

(ii) Typical signals or images exhibit distinctive sparsity patterns in their wavelet coefficients, and 
become increasingly sparse at higher wavelet scales (asymptotic sparsity). 

In contrast to the large body of results in compressed sensing where the strategy is based on sparsity 
alone, the results of [2] demonstrated that one of the driving forces behind the success of variable 
density sampling strategies is their correspondence to the sparsity structure of the underlying signals 
of interest. These new principles provide a framework under which one can understand how to exploit 
both the sparsity structure of the underlying signal, and the correspondences between the sampling and 
sparsifying systems to devise optimal subsampling strategies mm- 

1.1 Contribution and overview 

The paper (2] is concerned only with the case where the sparsifying system is an orthonormal basis. 
On the other hand, many of the sparsifying transforms in applications tend to be constructed from 
overcomplete dictionaries, such as contourlets fH], curvelets 0E], shear lets ;12, 30] and wavelet frames 

nans. 

With this in mind, the recent work of m derives theoretical guarantees for certain nonuniform 
sampling strategies in the case of sparsity with respect to a tight frame. By defining the localization 
factor t] s d with respect to a sparsifying transform D G (£ Nxn an d a sparsity level s as 

Vs,d = V = sup | ^ D ^ £1 : |A| = s,g G TZ(D*P a ), \\g\\ p = lj , (1.2) 

their result is as follows. 

Theorem 1.1 ([27]). Let N G N and let s < N. Suppose that the rows {di,... ,d n } of D G C Nxn form 
a Parceval frame, the rows {vi,... ,v n } ofV G <C nxn form an orthonormal basis ofC n and suppose that 
sup 1<J<JV \(dj,Vk)\ < Hk- Let v be a probability measure on {1,... ,n} given by v(k) = p\/ ||/x||^ 2 , where 

A 4 = (Mfc)fc=l > an d W G C"X" be a diagonal matrix with diagonal entries (||/r||^ 2 /Pk)k~i- Let Q be a 
set of m independently and identically distributed indices drawn from {1,... ,n} with the measure v. If 

m > Cif HMII 2 s max {log 3 (sry 2 ) log(IV), log(e -1 )} , 


2 



for some absolute constant C, then with probability 1 — e, the following holds for every f £ C n : the 
solution f of 


argmin 9eC „ ||Dg||^i subject to 
with y = PaVf + e for noise e with weighted error 


W{P n Vg - y ) 


< <5 


(1.3) 


4= We 


e 


e 2 

< 6 satisfies 


f-f <C 1 S + C 2 a s (Df)s - 1 / 2 


where C\ and C 2 are absolute constants and given any vector x £ C n , a s (x) = min~ e cn 


x - z 


P- 


Although this theorem guarantees the recovery of all sparse vectors under a (fixed) nonuniform 
sampling distribution, it does not reveal any dependence between the sampling strategy and any spar¬ 
sity structure. In the case of subsampling the Fourier transform, this result implies that the sam¬ 
pling cardinality is to = O (s log 3 (s) log 2 (n)) when D is an orthonormal Haar wavelet basis, and 
to = O (s log 3 (s log(n)) log 3 (n)) when D is a redundant Haar frame. Due to the relatively large number 
of log factors, these sampling bounds are still substantially more pessimistic than what is often observed 
empirically, and one possible reason for this could be the lack of structure dependence considered in the 
theorem: in (j2j we will present a numerical example to explain why an understanding of this dependence 
is crucial to achieving subsampling. 

Therefore, the purpose of this paper is to develop a theory on how to structure one’s samples based 
on the sparsity structure with respect to a tight frame. The minimization problem tackled in this paper 


is also slightly different from (1.31 as we consider solutions of the more standard problem (3.2) with a 


uniform noise assumption, without additional weighting factors. We remark also that if there exists a 
strong dependence between the sampling strategy and the underlying sparsity structure, then a direct 
implication is that there does not exist a fixed optimal sampling distribution for all sparse signals, and 
this will be reflected in our main result as we account for recovery under various sampling distributions 
using the framework of multilevel sampling. 

The outline of this paper is as follows. lj3] recalls the key principles from [2] and a result on solutions 
of (1.1) in the case where D is constructed from an orthonormal basis. The main result of this paper 
is presented in where we reveal how the main result of [2] can be extended in the case where D is 
constructed from a tight frame. The remainder of this paper will be devoted to proving the result of S|4] 


Notation Given Banach spaces X and Y , let B(X, Y) denote the space of bounded linear operators 
from X to Y and let B(X) denote the space of bounded linear operators from X to X. Let W be a 
Hilbert space and given any subspace S C T~L, Q s denotes the orthogonal projection onto S. We say that 
{ipj : j £ N} is a frame for H if there exists c, C > 0 such that 

c\\g\\ 2 H < Y\(9 ,<Pj)\ 2 < c \\g\\u, Vg e n. 

N 


We say that {ipj : j £ N} is a tight frame if c = C. If c = C = 1, then {ipj : j £ N} is said to be a 
Parseval frame. Given any linear operator U , let Tl(U) denote its range and let JY(U) denote its null 
space. 

We will also consider the sequence spaces £ P (N) for p £ [l,oo]. Let {ej : j £ N} denote the canonical 
basis for the £ P (N) space under consideration. Given any A C N, Pa denotes the orthogonal projection 
onto spanjej : j £ A}. Given M £ N, let [M] := {1,..., M}. Given 2 £ £ 2 (N), let sgn(z) £ £°°(N) be 
such that for each j £ N, 


sgn (z)j 


Zj/\zj\ 0 
0 otherwise 


Given q £ (0, oo], the £ q norm (or quasi-norm if q £ (0,1)) is defined for z = (.ZjjjeN as 


\\z\\% = Y N 9 ’ ge(0,oo), 

3 


\\ z h°o = sap | Zj | , 

3 
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Let || • denote the operator norm of i3(^ p (N), t! 9 (N)) for p,q £ [1, oo]. If X and Y are Hilbert 

spaces, we will simply denote the operator norm of B(X,Y) by || • ||. Given a, b € R, a < b denotes 
a < C b where C is a constant which is independent of all variables under consideration. The identity 
operator is denoted by /, and the space on which this is defined will be clear from context. 

2 The need for structure dependent sampling 

To illustrate the need to account for sparsity structure when devising subsampling strategies, let us 
consider the case of recovering finite dimensional vectors, where we are given access to a subset of their 
Fourier coefficients and the sparsifying system is the two redundant discrete Haar wavelet frame. The 
Haar frame is defined in detail in the appendix - [A] In the following example, A will denote the discrete 
Fourier transform, and D will denote the discrete Haar wavelet transform. 

A numerical example Let N = 1024 and consider the recovery of the two signals aq and X 2 shown 
in Figure [l] from subsampling their discrete Fourier coefficients by solving These signals are 

constructed such that ||Dxi|| 0 = ||-Dx 2 ||o = 100, where we define the sparsity measure of a signal by 
|h|| 0 := \{j : {\zj\ ^ 0}}| for any 2 e C M with M e N. The sparsity patterns of Dx i and Dx 2 are 
shown in Figure [Tj Observe that compared to Dx 2 , Dx 1 has a higher proportion of large coefficients 
with respect to the higher scale frame elements. Let f 1y index 130 of the rows of A (12.7% subsampling), 
so that the indices correspond to the first 41 Fourier coefficients of lowest frequencies plus 89 of the 
remaining coefficients drawn uniformly at random. The reconstruction of R(xi,Cly) and R(x 2 ,D,y) 
from their partial Fourier samples are shown in the top row of Figure [2] Note that although the same 
sampling pattern is used for both reconstructions, and both signals have the same sparsity with respect 
to D, R{x 2 , Hy) is an exact reconstruction of X 2 whilst R(xi,fly) incurs a relative error of 34.85%. 
This simple example suggests that to subsample efficiently, it is not sufficient to consider sparsity alone. 
We remark also that unlike sampling with unstructured operators such as random Gaussian matrices, 
uniform random sampling will yield poor reconstructions for both signals. The second row of Figure [2] 
shows the reconstruction R(xi,fljj) and R(x 2 ,Qu), where indexes 130 of the available coefficients 
uniformly at random. Finally, it is interesting to note that the high frequency samples indexed by fly 
are required for an exact reconstruction of X 2 as an error is incurred when one simply samples the Fourier 
coefficients of lowest frequency (see the bottom row of Figure [ 2 ]). 

Remark 2.1 In the context of sampling the Fourier transform of a signal, which is sparse with respect 
to some multiscale transform (such as wavelets, curvelets or shearlets), it is now commonly observed that 
uniform random sampling yields highly inferior results, when compared with variable density sampling 
patterns which focus on low frequencies. The numerical example in this section simply highlights this 
observation, and reminds us that the performance of these variable density sampling patterns are highly 
dependent on the sparsity structure of the underlying signal, and not just the sparsity level alone. Thus, 
there is a need for a theory which describes how the sparsity structure of the underlying signal should 
impact the choice of the sampling pattern. 


3 Structured sampling with orthonormal systems 

The main result of this paper will be an extension of the abstract result of [2j to the case where the 
sparsifying transform is a tight frame. This section recalls the key concepts introduced in [2] to analyse the 
use of variable density sampling schemes for orthonormal sparsifying bases. We first remark that although 
compressed sensing originally considered only finite dimensional vector spaces, the applications in which 
variable density sampling tend to be of interest are more naturally modelled on infinite dimensional 
Hilbert spaces. For this purpose, a Hilbert space framework for compressed sensing was introduced in 
[Ti and [2]. 

For a Hilbert space R. and given orthonormal bases (the sampling vectors) and 

(the sparsifying vectors), define the operators 

V : H(N), f ((/, D : H —> £ 2 (N), / '-t (3-1) 
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Zoom of X\ x 2 
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Figure 1: Top row: Two test signals. Only a zoom of X\ is shown since it is supported only on the 
indices ranging between 100 and 158. Both signals have equal sparsity - for each i = 1,2, ||-Da;j|| 0 = 100. 
The second to the bottom rows show the reconstructions from different sampling maps. Bottom row: 
the sparsity structure of Dx i and Dx 2 . The graph for \Dx 2 \ has been capped off at 20 to allow for a 
clear comparison with \Dx\\. 
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fly (half-half) 


l?(x 2 ,fly), Err = 0% 


Zoom of R(xi,tlv), Err = 34.9% 


fljj (unif. rand.) 







flz, (low freq.) 


Zoom of R(xi,£Il), Err = 74.8% R(x 2 ,^l), Err = 5.0% 



Figure 2: Reconstructions of sq and X 2 obtained by solving <0> with different sampling maps 57 which 
index 130 of their Fourier coefficients (12.7% subsampling), fly indexes the first 41 coefficients of lowest 
frequencies, plus 89 the remaining coefficients chosen uniformly at random, fly indexes 130 of the 
coefficients uniformly at random, fl^ indexes the 130 coefficients of lowest frequencies. 
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Suppose we wish to recover some / £ T~L from samples of the form y = {{f,ipj))j e n + ij = PqV f + rj 
for some LI C N and noise vector 77 of £ 2 -norm at most S. A key question in compressed sensing is how 
solutions to the following minimization problem allows one to exploit the sparsity of some f £ T~L with 
respect to D to obtain accurate recovery from a minimal number of samples. 


^ ^ nf W D 9\\^ subject to || y - P n Vg\\ t2 < S. 

g£l-L,Dg€£ L ( N) 


(3.2) 


The coherence (defined below) of the operator VD* has been recognized to be an important factor in 
determining the minimal cardinality of the sampling set LI. Note that this can be seen as a measure of 
the correlation between the sampling system associated with V and the sparsifying system associated 
with D. 


Definition 3.1 (Coherence). Let U be a bounded linear operator on f 2 (N) (or let U £ CA xJV f or some 
N £ NJ be such that 11t/ey |L 2 = 1 for all j £ N (or j = \,... ,N). Let {ej : j £ N} be the canonical basis 
of£ 2 ( N) (orC N ). The coherence of U is defined as g{U) = sup fc j \(Uej,ek)\- 


For the case where VD* £ <C NxN is a finite dimensional isometry, the main result of [7j showed 
that if f l C {1,..., N} consists of O (s g 2 (VD*) N log IV) samples drawn uniformly at random, where 
/ is s-sparse, then any solution / to (3.2) satisfies ||/ — f\\ < CS for some universal constant C > 0. 
Furthermore, one cannot improve upon the estimate of O (s g 2 (VD*) N log A). Thus, for the recovery 
of sparse signals, the minimal sampling cardinality is completely determined by this coherence quantity. 

Unfortunately, when p(VD*) ss 1, this result merely concludes that LI must index all available 
samples. This is especially problematic because when VD* is a bounded linear operator defined on 
the infinite dimensional Hilbert space £ 2 (N) - it is necessarily the case that p(VD*) > c > 0 for some 
constant c and one cannot expect the coherence of any finite dimensional discretization of VD* to be 
of order O (A^” 1 / 2 ) (see [2, f° r a detailed explanation of this phenomenon). In the case where V is 
associated with a Fourier basis and D is associated with a wavelet basis, it is necessarily the case that 
p(VD*) = 1. 

The key idea of |2] is to recognize that by placing additional assumptions on the sparsity or compress¬ 
ibility structure of the underlying signal, one can make non trivial statements on how H can be chosen 
in accordance to the underlying sparsity. Thus, to consider how one should draw samples from the first 
M samples in order to accurately recover / £ TL, with \\PADf\\ e i <C ||/|| w for some A C {l,...,iV} 
with | A | = s, one approach is to divide the sampling and sparsifying vectors into levels then analyse the 
correspondence between the different sampling and sparsifying levels. The main theoretical result from 
[2] is based on three principles: 


• Multilevel sampling - instead of considering sampling uniformly at random across all available 
samples, partition the samples into levels and consider sampling uniformly at random with different 
densities at each level. This model was introduced to analyse the effects of nonuniform sampling 
patterns. 

• Local coherence - the coherence of partial sections of VD*. 

• Sparsity in levels - instead of considering sparsity across all available coefficients, partition the 
coefficients into levels and consider the sparsity within each level. 


We define each of these concepts below. 


Definition 3.2 (Multilevel sampling). Let r £ N, M = (Mi,..., M r ) £ FT with 0 = Mq < Mi < ... < 
M r . m = (mi,..., m r ) £ N r , with irik < — M^_ 1 , k = 1,..., r, and suppose that 


Llk Q {M fc _i + 1,... ,M fc }, |fik| =m k , k = 1,..., r, 

are chosen uniformly at random. We refer to the set f l = SlM.m = Hi U • • • U Ll r as an (M, m )-multilevel 
sampling scheme. 
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3.1 Sparsity in levels 

The notion of sparsity in levels is defined as follows. As explained below, this notion is particularly 
important when considering wavelet sparsity for imaging purposes. 

Definition 3.3 (Sparsity in levels). Let x be an element of either C N or £ 2 (N). For r £ N let N = 
(Ni, ..., N r ) G N r with 0 = N 0 < N 1 < ... < N r and s = (s 1; ... ,s r ) £ N r , with s k < N k — N k _i, 
k = 1,... ,r. We say that x is (s, N )-sparse if, for each k = 1 ,... ,r, A k := supp(a;)n{iVfc_i + l,..., N k }, 
satisfies | A*,| < s k . We denote the set of (s, lSl)-sparse vectors by £ s ^n- 

Definition 3.4 ((s,N)-term approximation). Let x = ( Xj ) be an element of either C N ort 2 (N). We 
define the (s, N )-term approximation 


&s,n{x) = min ||x-77||*i. 


(3.3) 


As well as the level sparsities s k defined in Definition 3.4 we shall also require the notion of a relative 
sparsity, which takes into account the sampling operator V and will account for how different levels 
interfere with each other. 


Definition 3.5 (Relative sparsity). Let V, D G Bfflj'H') where H is a Hilbert space and H' is either C N 
or f 2 (N). Let s = ( Si )J =1 G N r , N = G N r and M = (M,-)J =1 G N r with 0 = N 0 < iVi < • • • < 

N r and 0 = Mo < Mi < ■ ■ ■ < M r . For 1 < k < r, the k th relative sparsity is given by 

Kk = k*(N,M,s) = maxllPr^ffll 2 , 


where = (Mk-i, M k \ PlN and 0 is the set 

0 = {.9 GH:, \\Dg\\ t ~ < 1, |supp(P Al Dg)\ = s h l = 1,..., r}. 


where A t = (N k _ 1 ,N k ] n N. 


The Fourier/wavelets case 

On level sparsities ft has been established that natural images are not simply sparse in their wavelet 
coefficients, but exhibit a distinctive ‘tree-structure’ in their coefficients HU- Given a wavelet basis 
{^jj}- eN , it is often the case that a typical image with sparse approximation w iU actually 

not be sparse with respect to the wavelets of low scales, but will become increasingly sparse with respect 
to the wavelets of higher scales. In particular, if {N k } keN corresponds to the wavelet scales so that 
Wj}j<N k consists °f ail wavelets up to the k th scale, and s k = | A n (iVfe_i, AT fe ] | is the sparsity at the 
k th wavelet scale, then one typically observes that although Si/N\ « 1, one has asymptotic sparsity with 
s k /{N k — Afc-i) —> 0 as k increases. This phenomenon is illustrated in Figure[3j 

Thus, for the purpose of reconstructing natural images, it is perhaps too general to consider the 
recovery of all sparse wavelet coefficients and it suffices to consider the recovery of images whose sparse 
representations exhibit asymptotic sparsity. This is the motivation behind the concept of sparsity in 
levels. 

On relative sparsities In the case where V is the Fourier sampling operator and D is the analysis 
operator associated with an orthonormal basis, one can in fact show that the change of basis matrix 
VD* G S(£ 2 (N)) is near block diagonal and by letting M and N correspond to wavelet scales, 

r 

« fc (N,M,s)<^ Sj A-l^- fc l, 

3 =1 

for some A > 1 which depends only on the given wavelet basis. So, the dependence of the k th relative 
sparsity on each Sj decays exponentially in \j — k\ and moreover, it follows that 1$ J2 k s k- The 

reader is referred to |2j for a proof of this. 
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Figure 3: Left: reconstruction from the largest 6% of the Daubechies-4 wavelet coefficients of a 1024 x 
1024 image. Centre: location of coefficients in the sparse representation - coefficients are ordered in 
increasing wavelet scales away from the top left corner. Right: fraction of coefficients at each wavelet 
scale k which contribute to the sparse representation. 

3.2 Local coherence 

Although the coherence between the sampling and sparsifying systems is a crucial concept in the un¬ 
derstanding of the minimal sampling cardinality required for the recovery of sparse signals, there are 
important systems of interest in applications where it is simply too crude to consider coherence alone. 
Instead, we require the more refined notion of local coherence. 

Definition 3.6 (Local coherence). Let V,D£ B(TL,'H') where H is a Hilbert space and TL' is either C N 
or t 2 (N). Let N = (Ni,... ,N r ) £ N r and M = (Mi,..., M r ) £ FT with 0 = No < Ni < ■ ■ ■ < N r and 
0 = Mo < Mi < ■ • ■ < M r . For k = 1,..., r, let r*, = {M^-i + 1,..., M&} . For k = 1,..., r — 1, let 
A k = {Nk -1 + 1,..., Nk} and let A r = {n £ N : n > N r }. The ( k , /) th local coherence between V and D 
with respect to N and M is given by 

Mn,m(M) = Vh{Pr k VD*P Al ) p(Pr k VD*), k = l,...,r, 1 = 1, ...,r. 

The Fourier /wavelets case 

If VD* is constructed from any orthonormal wavelet basis with Fourier sampling, then it is necessarily 
the case that p(VD*) = 1. However, it is only the initial section of VD* associated with low Fourier 
frequencies and low wavelet scales that has high coherence. In particular, one can show that 

f ,(P [ ± N] VD*)^(VD*P [ ± N] ) = o (iv- 1 / 2 ). 

Finally, we remark that this property of asymptotic incoherence (decay in the coherence away from 
initial finite sections) is not unique to the Fourier/wavelets case, but can also be observed for other 
representation systems such as Fourier/Legendre polynomial systems. In the Fourier/wavelets case, it 
is this decay in the local coherences that makes it possible to exploit sparsity to subsample the Fourier 
coefficients. 

3.3 Recovery guarantees in the case of orthonormal sparsifying transforms 

When we are considering the recovery of an infinite dimensional object by drawing finitely many samples, 
one may ask the following question: What is the range of the samples, M, that we should sample from in 
order to recover a sparse representation with respect to the first N sparsifying elements? This question 
is addressed by the balancing property. 

Definition 3.7 (Balancing property [2]). Let VD* £ B(£ 2 (N)) be an isometry. Then M £ N and K > 1 
satisfy the balancing property with respect to V, D, N £ N and s £ N if 

\\I\ N] DV*P^ ] VD*P m H/oo^o. < i (log* 72 (4 VsKM ))” 1 , (3.4) 
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and 


\\P^ n] DV*P [m] VD*P [ n] \\^^ < i, (3.5) 

where ||-||^oo^oo is the norm on B(£°°( N)). 

We now recall the main result of [2] which informs on how multilevel sampling will depend on local 
coherences and the underlying sparsity structure. For this, we require the following notation: 

M = min{z £ N : max ||Pr M i[/efe|| < 1/(32g _1 -v/s)}, 
k>i 

where M. s and q are as defined below. 

Theorem 3.8. Let VD* be an isometry either on £ 2 (N) or C N . Let f £ T-L. Suppose that Q = flM.m 
is a multilevel sampling scheme, where M = (M \,..., M r ) £ N r and m = (mi,..., m r ) £ N r . Let (s, N), 
where N = (Ni ,..., N r ) £ N r , Ni < ... < N r , and s = (si,..., s r ) £ N r , be any pair such that the 
following holds: 

(i) the parameters M = M r ,q _1 = max^i.. ^ { Mfc m } > satisfy the balancing property with respect 
to V, D, N := N r and s := Si + ... + s r ; 

(ii) for e £ (0, e -1 ], 

m (k,l)sA, k = l,...,r, 

and mk > rhk log(se _1 ) log , where rhk is such that 

MN,M(M) g fc, l = l,...,r, (3.6) 


1 > — -—— log(se x ) log (q 1 M y /sj I ^2 


m k 


Mn, 


1 


for all (si,..., s r ) £ M!j_ with si + - • -+s r < si + - • -+s r , and §k < kk( N, M, s) for each k = 1,..., r. 


Suppose that f is a minimizer of (3.2) with y = PqV f+rj and \\r ]\\ e 2 < 5. Then, with probability exceeding 
1 - e, 

11/ - /|| < C (q~ 1 / 2 8 (1 + L V~s) + * a MDfj) , 

for some constant C, where ct s .m is as in (3.3), and L = C ^1 + lo J^) ^ ■ U m k = Mk — Mk -1 
for 1 < k < r then this holds with probability 1. 


Notice that the number of samples at each level is dependent on the local coherences between V 
and D, the level sparsities {s*,} and the relative level sparsities {s*,}. As discussed in [2J, the relative 
level sparsities accounts for the interference between the different sampling and sparsifying levels and 
cannot be removed from the estimates. However, recall that in the case of Fourier sampling with wavelet 
sparsity where the levels correspond to the wavelet scales, one can essentially show that the dependence 
of s k on each Sj becomes exponentially small as \k — j\ increases. 

This result firstly suggests that even in cases where incoherence is missing, subsampling in accordance 
to sparsity is still possible provided that the sampling and sparsifying bases are not uniformly coherent - 
subsampling is possible when local coherence is small. Note also that this result suggests that a change 
in the sparsity structure, i.e. the distribution of {sfc} and {sfc}, should result in a change in the sampling 
strategy. 
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4 Main result 


The work of [2] provides an initial understanding on how one can structure sampling in accordance to 
underlying sparsity structures so that the number of samples require is (up to log factors) linear with 
sparsity. A natural extension of this work would be to consider this question when D is an analysis 
operators associated with a tight frame instead of an orthonormal basis. This is of particular interest 
due to the recent development of sparse representations with respect to multiscale systems such as 
wavelet, curvelet and shearlet frames. In this paper, we will consider the case where V : TL —> ^ 2 (N) and 
D : TL —> t? 2 (N) are isometries. This assumption simply states that V and D are the analysis operators 
of Parseval frames, i.e. {V’j}, g N and {^} jeN are both Parseval frames of TL in (|3.1 ). 


Note that if D is associated with an orthonormal basis instead of a Parseval frame (i.e. D is unitary), 
then (3.2) is equivalent to 


inf ||x||^i subject to \\y - PnVD*x\\ (2 < 5. 
xee 1 ^) 


(4.1) 


This minimization problem is referred to as synthesis regularization. On the other hand, in the case of 


non-orthonormal systems, (3.2) (often referred to as analysis regularization) and (4.1) are no longer equiv¬ 
alent. Some of the differences between synthesis and analysis regularization were investigated in m and 
while the majority of theoretical works in compressed sensing has focussed on synthesis regularization, 


the theory behind the solutions of the analysis regularization problem (3.2) is less comprehensive. 


4.1 Sparsity 

In this section, we introduce concepts for describing sparsity under an analysis operator. In considering 


the solutions of (3.2), it is intuitive that this minimization problem will favour signals / for which the 
entries in Df have fast decay or are mainly zero entries. Note also that if there exists an index set A 
such that P^Df = 0, then / G Af(P^D) C 1Z(D* Pa) whenever D*D = I. In the works of [HI [27], the 
signal space considered is, for each sparsity level s, the union of subspaces spanned by s columns of D *, 
W = U| A | =s 7^(D*Pa). 

As discussed in ED, to understand the impact of sparsity on the recovery of such a model, it is natural 
to consider the effects of the analysis operator D on any given / G W and in particular, the approximate 
sp arsity of Df. For this purpose, m introduced the localization factor ip which we previously recalled 


in (1.2), and their recovery estimates were given in terms of rj 2 s. Moreover, as observed in [ 8 ], a standard 


measure of sparsity or compression in a vector is the quasi £ p norm with p < 1. With this in mind, we 
introduce that concept of localized sparsity below. 

Definition 4.1. Let r G N and let N = ( Nj)j =1 G FT, s = G N r . Assume that Ni < N 2 < • • • < 

N r =: N. Let Aj = N D (Nj-i, Aj] for j = 1,..., r — 1 and A r = N D (A r -i ; oo). Let p = 2~ J for some 
J G N U {0}. Let n > 0 be the smallest number such that 


A-p/q 


> sup j||.Dg||p : g = D*x , \\Dg\\ £q =1,iG S s ,n} , Q G {2, oo} , 


(4.2) 


where we let p /oo = 0. Then, k(N, s,p) = k is said to be the localized sparsity with respect to p, N and 


For each j = 1,..., r, let Kj > 0 be the smallest number such that 


A~p/i 


> 


sup{||P Aj Z)g| 


: g = D*x , P A P 5 = l,x G £ s 


,N 


j, q G {2, oo} , 


Then N, s ,p) = kj is said to be the j th localized level sparsity with respect to p, N and s. 

Remark 4.1 Ob serve that the localized sparsity is related to the localization factor in ( |1.2| ): if p = 1 
and q = 2 in (4.2), then it suffices to let k = g 2 s. 


One can consider n( s, N) to be a measure of the analysis sparsity of an element / (i.e. sparsity of Df) 
given that it is synthesis sparse with respect to the frame {fj} associated with D (i.e. / = (Cjca x fTj 
with | A| = s and x G C A ). Note that if D is associated with an orthonormal basis, then DD* is the 
identity and it suffices to let k = Si + • • • + s r . 

The localized level sparsities k,(s,N) describe the sparsity structure of Df given that / is synthesis 
sparse with a (s, N)-sparsity pattern. Again, if D is associated with an orthonormal basis, then these 
localized level sparsities are simply the level sparsities {sj}^ =1 - 
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We also require the definition of relative sparsity, note that the only difference to Definition 3.5 
that the set 0 is defined in terms of ||-||^2 instead of II-Loo- 


is 


Definition 4.2 (Relative sparsity). Let V,Dg B(TL,TL') where H is a Hilbert space and H' is either C N 
or £ 2 (N). Let k = £ N r , N = {Nj) r j=1 e N r and M = (M ^ =1 £ with 0 = N 0 < Nx < ■ ■ ■ < 

N r and 0 = AIq < M\ < • • • < M r . For 1 < k < r, the fc th relative sparsity is given by 

k k = k k { N,M,k) = max||Pr fc Fg|| 2 , 

gee 

where T*, = (. ] flN and 0 is the set 

Q = {gGH:g = D*r 1 , \\P Al Dg\\% < Kl , l = 1,... ,r}. 
where A; = (N k _x,N k ] n N. 


4.2 Main result 


The main result of this paper describes how the reconstruction error of any solution of (3.2 1 depends on 


the choice of samples. Note that the problem of considering the minimizers of (3.2 1 is well posed since 


minimizers necessarily exist (see B.ll 


In the case of orthonormal systems, the balancing property provides an indication of the range that 
one should sample from when recovering a sparse support set A C [ N ] for some N £ N. This condition 
essentially describes how large M must be such that P[m]V is close to an isometry on 1Z(D*P/ A ) for 
all A C [N\. In the case where D is no longer constructed from an orthonormal basis, we define the 
balancing property as follows. 

Definition 4.3. Let V, D £ B('H,i 2 (N)) be isometries. Then M £ N and K > 1 satisfy the balancing 
property with respect to V , D, N, s £ N and K 2 > > 0 if for all W = 1Z(D*P A) where A C [N] is 

such that | A | = s, 


\\DQ w V*P(- m] VQ w D*\\ p ^ < ^ Kl J K2 (logf (4 ^KM) 


(4.3) 


and 


\\DQ^V*P [M] VQ w D*\\ e 2^ < 


8 yW 


(4.4) 


Although this balancing property conceptually enforces the same isometry properties as the balancing 
property presented in the case of orthonormal systems, note that the conditions are stated in terms of 
the £ 2 norm instead. This difference is due to a slightly different dual certificate construction in the proof 
of our main result, and this slightly stronger balancing property will allow us to derive sharper bounds 
on the number of samples required. We remark also that in the case where = K 2 , this balancing 
property in fact reduces to the original balancing property introduced in [Ij. 

In the following theorem, for r £ N, let M = () £ N r , m = (mi,...,m r ) £ N r , 
N = (Ni ,..., N r ) £ N r , Ni < ... < N r , and s = (si,..., s r ) £ N r . For p £ (0,1], let k = («j)J =1 with 
Kj = Kj(N,s,p) and let kj = kj (N, M, k). Let 


M = \\DD* 


and 


l^oo 


* £ N : max || P\j 

j>i 


i VD* 


< 




max 

j>i 


Q'R.(D*p [N] )D*e j 


< 


B( s, N) = sup |s(A) : A is (s, N) -sparse |, 
where, given any A C N and Wa = 7Z(D*Pa ), 


4 /’ 

(4.5) 

(4.6) 


B{ A) = max | \\DQ^D*\\ eaa ^ e<x> , ||DQw A L>*||^^oc • max ||P Ai DQ WA D*P At ||^ 00 ^ £o 


The key notations used in Theorem 4.4 are summarized in Table [l] 


(4.7) 
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Notation 

Description 

V 

D 

r 

N = (N k y k=1 
M = (M k y k=1 
m = {m k ) r k =1 
s = (s k )l = 1 

Pn.m {k, l) 

C S ,N 

Kj 

kj 

M 

B( s,N) 

Sampling operator 

Sparsifying operator 

Number of levels 

Divides the sparsifying coefficients into levels 
Divides the sampling coefficients into levels 
Number of samples at each level 

Level sparsities 

(fc, l) th localized coherence, see Definition 3.6 
See Definition 3.4 

j th localized sparsity, see Definition 4.1 
j th relative sparsity, see Definition 4.2 

See (4.51 

See (4.61 


Table 1: Summary of the key notations for Theorem 4.4 


Theorem 4.4. Let TL be a Hilbei't space and let V,Dg B(H,£ 2 (N)) be isometric linear operators. Let 
f € TL. Suppose that fl = 17 m, m is a multilevel sampling scheme. Let (s,N) be such that the following 
holds: 

(i) the parameters M = M r ,q~ 1 = max ^-^. r j Mk ~ n ^ k k ~ 1 j , satisfy the balancing property with respect 
to V, D, N ■= N r , K m i n = rmin{Kj} and At max = rmax{nj}; 

(ii) For e £ (0, e _1 ], 

Mk — M k 1 


1 ^ \fr log(e 1 ) log (q 1 M^/n max ) B( s,N) 


m k 


0 K i 


k = l,...,r, 


V Z = 1 


and m k > rrhkB( s, N) 2 log(e log (q 1 M v /« max ^ , where rhk is such that 

1 \ f Mk — M k ~i \ 2 n i\ ~ i — i 

1 -C / y I ;; 1 I 0 K k, l — 1,... ,r. 

V m k J 


Suppose that f is a minimizer of (3.2) with y = PqV f+rj and ||? 7||^2 < <5. Then, with probability exceeding 
1 - e, 

11/ - /II < C (q~ 1/2 5 (1 + L v/^) + <7 B , N (£>/)) , 

for some constant C, where ct s .n is as in (3.3), and L = 1 + log ) ■ V m fc = M k — M k -i for 
1 < k < r then this holds with probability 1. 


4.2.1 The unconstrained minimization problem 


Instead of solving the constrained minimizaton problem in Theorem 4.4, for computational reasons, it is 
often of interest to solve instead an unconstrained minimization problem for some a > 0, 


Ma\\Dg\\ el + \\P Q Vg - y\\%. 
gen 

The following result presents a recovery guarantee for this unconstrained problem. 


(4.8) 


Corollary 4.5. Consider the setting of Theorem 4-4 an d let a = \fqS- Then, with probability exceeding 
(1 — e), any minimizer f of (4-8) satisfies 


f-f 


H 


— ^ (^1 ' / * J T L\/K max T L zt max ^ T <tn, s (H/). 
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Remark 4.2 Note that by choosing a = -^/qS, the guaranteed error bound is, up to ^qL 2 s , the same as 
the guaranteed error bound of solutions to the constrained problem 


inf H-DgLi subject to 
gen 


\PnVg - y\\ p < S. 


This affirms the finding in [3J Figure 7], which numerically demonstrates that there exists a linear relation 
between the regularization parameter a and noise level of the measurements S. Moreover, this linear 
scaling increases as q increases. 

4.3 Remarks on the main result 

4.3.1 On the factor r 

In the case where D is associated with an orthonormal basis, the key difference between our main result 


and Theorem 3.8 is that bounds on the number of samples in Theorem 3.8 has a factor of log(s) while 
the bounds in Theorem 4.4 have a factor of r (the number of levels) instead. In general, the sparsity 
s may grow as the ambient dimension N r increases, whilst the number of levels r can be thought of as 
simply a constant; for example, r = 2 in the case of the half-half schemes presented in jj2] (see also [35] 
for the application of a half-half scheme in fluorescence microscopy). Therefore, Theorem 4.4 may be 


considered to provide slightly sharper bounds than Theorem |3.8| and is in fact optimal in the case where 
r = 1 (since the optimal sampling cardinality is O (slog N) [7|). We remark however, that by utilizing 
the construction of the dual certificate from [ 2 ], one can replace the factor of r with log(M). 

4.3.2 Remarks on reconstructing -lets from Fourier samples 

We begin with a corollary of Theorem |4.4| Its proof can be found in Section [5] 

Corollary 4.6. Let V and D be isometries. Let {NjY- =1 ,{Mj} r . =1 £ N r with Tj = (Mj_i,..., Mj] D N 

and Aj = (Nj- 1, ..., iVj]nN where Mo = Nq = 0. Let u> be a non-negative function defined on {1,..., r } 2 
with 


5 >(M)<tf, k = 1,... ,r, 5 >(fc, 1)<C, 

1=1 k =1 

for some C > 0. Suppose that 

||Pr fc WD*P Ai || < w(M), Mn,m(M) < w(M) ’ min 


1 = 1 , 


1 1 

iW Mfe_! 


(4.9) 


(4.10) 


Then condition (ii) of Theorem 4-4 holds provided that 


m k > rC 2 B{ s, N ) 2 • log(e : ) log(g 1 M y /K max ) 


M k — M k -1 
Mk -1 


w(fc, l)ni 


i=i 


In particular, we have that 


mi H-b m r > C ■ log(e )log{q My/K max ) ■ («i H-h n r ), 

where C = rC 3 P(s, N) 2 max ^ =1 {(M k — M k -i)/Mk-\}■ 

Note that the dependence of our main result on the localized coherence terms allows one to exploit 
both asymptotic incoherence and the correspondences between the different sparsifying and sampling 


levels. Conditions (4.9) and (4.10) essentially control the correspondence between the different sampling 


and sparsifying levels, whilst maintaining asymptotic incoherence in VD*. Under these conditions, 
this result presents a direct link between the localized sparsities and the sampling strategy where the 
dependence of the number of samples in the k th level m k on the I th localized sparsity ni is weighted by 
u>(k,l). Note also that the only other dimensional dependencies consist of one log factor and the factor 
of P(s,N), which numerically does not seem to be significant (see Section 4.3.3). 

Of course, further analysis of Corollary |4.6| would be necessary for a full comparison between our 
results and Theorem ED however, an advantage of Corollary |4.6|is that it makes explicit the dependence 
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between how one should subsample and the sparsity structure, and provided that B(s, N) remains 
bounded, Corollary |4.6| will provide for a sharper estimate on the sampling cardinality. In the case where 
V is constructed from a Fourier basis and D is constructed from a wavelet basis, it is in fact the analysis 


of (4.9) and (4.10) that enabled [2] to derive sharp bounds on the sampling cardinality. We now explain 
this in more detail: 

[2] considered the case where 

D : I/ 2 [0,1) — > f 2 (N), f ^ {(fi<Pj))je n 

for some orthonormal wavelet basis {<Pj} associated with a scaling function $ and a mother wavelet T 
satisfying for all £ £ M, 


<a + k e ir 


*(0 <(i + kir^- 


(4.11) 


where $ and 4/ denote the Fourier transforms of $ and respectively, and the Fourier sampling operator 

V : L 2 [0,1) £ 2 (N), / >-+ ((/, e“')) feez 

for some appropriate Fourier sampling density cj £ (0,1]. Then, if we let (Mk) and (Nk) correspond to 
wavelet scales, so that Mk = O ( 2 Rk ), Nk = 2 flfc and (Rk)k-i G N r is an increasing sequence of integers, 
then we can let 

w(k,j) = mm{A R *- Rk - 1 ,B Rk - R *- 1 } (4.12) 

where A, B > 1 are constants which depend on the Fourier decay exponent /3 and the number of vanishing 
moments of the generating wavelet. Furthermore, since D is constructed from an orthonormal basis, 
Kj = Sj for each j = 1 ,...,r, and B( s,N) = 1. [2] also analysed the balancing property in the 

Fourier/wavelets case and condition (i) can be shown to hold provided that 

M r > log 2 (e- 1 lV r ^- 1 ) 1/( ^- 1) , 

and log(M) < log(M r ). So the number of samples needed on the k th level is 


mk > C 


Mk — Mk -1 

Mk- i 


Sk 


E ^ 

j<k-l 


Rj—Rk-i, 


E BRk 


Rj— 1< 


j>k +1 


with C = rlog(M). Note that the total sampling cardinality is, up to one log factor and the ratio 
1 { } ! li near with the total sparsity. 


max 


k =1 


It is likely that by carrying out a similar analysis in [2], one can apply Theorem 4.4 to derive sharp 
recovery results for the recovery of coefficients with other multiscale systems, such as slrearlets and wavelet 
frames from Fourier samples. This work is beyond the scope of this paper, however, we simply highlight 
two aspects of any such analysis. The first part of the analysis would include precise estimates on the 
correspondences between the different sampling and sparsifying levels (i.e. analysis of ui in Corollary 
4.6). In the case of orthonormal Fourier and wavelet bases, the choice of u> in (4.12) is simply due to 


the Fourier decay (4.11) and the number of vanishing moments in the underlying wavelet and not on 


orthogonality properties. Thus, such a choice of w would also suffice in the case of wavelet frames with 
Fourier decay and vanishing moments properties. Furthermore, since similar Fourier decay estimates and 
vanishing moments properties also exist for multiscale systems such as curvelets and shearlets, it would 
be possible to derive similar estimates in the case of other multiscale systems. Secondly, the key difference 


between Theorem 3.8 and Theorem 4.4 is the localized sparsity k(s,N) and the localized level sparsities 
with respect to D. As mentioned, these terms are equal to the sparsity and level sparsity terms when 
DD* is the identity, furthermore, it is known that multiscale systems such as wavelet frames, shearlets 
and curvelets are intrinsically localized with near-diagonal Gram matrices. It is therefore conceivable 
that this property can be exploited to show that localized sparsity k is close to the true sparsity s. This 
idea is further discussed in 95.11 
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Figure 4: Plot of E from (4.131. 


4.3.3 OnB(s,N) 

In the case where D is associated with an orthonormal basis, B(s, N) = 1. Further analysis of this 
quantity will be left as future work, however, we simply remark here that initial computations of this 
quantity suggest that the impact of £?(s,N) will not be significant: To test the behaviour of £?(s, N), 
consider the following experiment where we test the behaviour of this quantity when considering the 
support of piecewise constant vectors, under the redundant Haar transform D. Given p £ N, let N = 2 P , 
D be the discrete Haar wavelet frame transform and compute 


E(p) = max |i?(A) : A = supp(D.T), x £ A^r j 


(4.13) 


where B is as defined in (4.7), and A^v is a collection of 1000 randomly generated piecewise constant 


vectors, each of length N. A plot of E for p = 4,..., 10 is shown in Figure [4] 


5 Localized sparsity 

In this section, we present some basic properties of the localized sparsity defined in Definition |4.1| The 
key findings which would be of use in the proof of Theorem |4.4| are summarized in Corollary |5.4| 

We first present Lemm a|5.1| to show that provided that VD* satisfies some “block diagonal” structure 
(so that uj(j, k ) in Lemma |5.1| decays sufficiently as \j — k\ increases), each relative sparsity term kj can 
be controlled in terms of {«;};=i an d the dependence on each m decays as |j — l\ increases. So Lemma 
|5.1| can then be applied to derive Corollary |4.6| which shows that under an additional assumption on 
the structure of VD*, the signal dependencies of Theorem |4.4| arise only in the localized level sparsities 
k = {nj} ■. Note that this block diagonal property can be shown to exist when V is a Fourier sampling 
transform and D is a wavelet transform ]T|. 

Lemma 5.1. Let V and D be isometries. Let {Nj}'- =1 ,{Mj}'. =1 £ N r with Tj = (Mj_i,... ,Mj] D N 
and Aj = (Nj-i ,..., Nj] D N where Mq = No = 0. Suppose that \\Pr k VD*P\ t || < u>(k, l ) and 

r 

w(fc, l) < C, k = 1,..., r. 

2=1 

Then 

r 

kj < cy»u,i )K i. 

i=i 
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Proof. Since D* D = /, 


= max \\P T] VD*Dg\\ 2 < max ( X \\P T] VD*P Kl Dg 


gee 


kI =i 


< max ( £ \\Pr :i VrrP Al \\\\DgW ) < ( £ 


d=1 


a=i 


< x c ^ • 


v/=l 


Z=1 


Z=1 


□ 


Proof of Corollary \4-6[ Since ^ M (^0 — ^(/c, /) min {1 /_!, l/M^-i}, condition (ii) of Theorem 
is implied by 

>C.^ Mk ~ Mk ~' 


4.4 


m fe 


Mk-i 


X^ w(fc, Z) • k/ , k = 1,.. .,r, 


«./=! 


and 


i r 5 v~' Affc — Mk —i uj(k,l) 

t r^j P ' ^ ' / j ’ t, r * ? 


m fc 


Mk-i 




k =1 

where B = rB( s,N ) 2 and £ = logfe -1 ) log(g~ 1 M^//t max ). 

Since X^I-=i w (fc, 0 — ^ f° r eac ^ ^ = 1, • • ■, r, (5.2) is true provided that 


TOfc 


> 


CBC(M k - M fc -i) 

Mk-l 

By Lemma |5.1[ this is true provided that 

C 2 BC(M k - M fc _i) 


Kfc, 


k = 1, • • • ,r. 


m fe 


> 


M k -i 


K k , k = 


(5.1) 


(5.2) 


(5.3) 


Note that (5.31 also implies (5.1). 

Finally, the last statement of Lemma 5.1 follows by summing up the TOfc’s and using uj{k,l) < 
for each l = 1,... ,r. 


Lemma 5.2. Let p < 1. Suppose that x has at most s non-zero entries and ||x||^, = 1 for some q > 1. 
Then, ||x||^ p < s 1 ~ p ^ q . 

Proof. Let A denote the support of x. By Holder’s inequality, 

( \ p/q 

Eton (| a |) 1-p ^ 9 = s i_p / ? . 
je A J 

□ 


Lemma 5.3. Let p £ (0,1] and let q £ [1, oo]. Suppose k > 0 is such that ||x|L 9 < 1 implies that 

IMIp. < p/q . 

(i) Hariri < k\\x\\ (00 . 

(n) \\x\\% < k|M|^. 

(in) If p = 2 ~ l for some L £ NU {0}, then \\x\\%i < k||x||^ 2 . 
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Proof. Without loss of generality, first assume that ||x||^oo = 1. Then, ||a;||^ p < k. So, (i) follows because 


\ x i\ p \ x j\ l p ^ w p - 


3 3 3 

and (ii) follows because 

J 2 N 2 = N p N 2 ~ p - N p - k - 

3 3 3 

To show (iii), assume (without loss of generality) that ||x||^ 2 = 1 and recall that p = 2~ L . Then, by 
repeatedly applying the Cauchy-Schwarz inequality, 

\ 1/2 

& = (EMI sEM P EM 2(I_p/2 ) s1EI- 12 ' 2 ’’' 


1/4 


1 / 2 l 


< k'-^imi.W 2 v \x,\ 2 -‘ r < «'- p/ 2 iwi ;. +1/2 E M 7 ~ 2 " r 




and by dividing both sides of the inequality by ||*||^i, we obtain 


|2(l-p/2) < K l-p/2 


\x\\p < K. 


□ 


Corollary 5.4. In the notation of Definition \f . l\ a direct application of the two lemmas presented above 
(with x := PA l DD*y, l = 1,... ,r) would yield the following results: 


1. Lemma 5.2' if D was the analysis operator of an orthonormal basis (so that DD* = I) in Definition 


f.l, then k( s, N) = si + • ■ ■ + s r . 


2. Lemma 


5.3 (i) : If y £ E Si n and DD*y\\ ioo = 1, then 

\\P Aj DD*y\\ < Kj(p, N,s), j = l,...,r. 


3. Lemma 


5.3 (ii) : If y £ E s .n and '\P Aj DD*y L = 1, then 


\P Aj DD*y\\ f2 < Kj(N,s,p), j = l,...,r. 


f. Lemma 5.3 (ii) : If y £ E s .n and | \P Aj DD*y 2 = 1, then 


\P A DD*y\\ < Kj(N,s,p), j = 1, ...,r. 


Numerical example: the Haar frame 

In the case where D is associated with a Haar frame on C N , it can be shown that if |supp(x)| = s then 
DD*x has at most O (slog N) nonzero entries (see [2Z|)- Therefore, from Lemma [5?2| k(N, s ) ^ s log (AT) 
where s = (sj^J—j and s = Si + ... + s r . In the case of a Haar frame, experimental results suggest that 
the localized level sparsities {kj} tend to follow a similar pattern to the level sparsities {sj}: Let D be 
the discrete Haar frame, and let / £ R 1024 be as shown on the right of Figure [H] Let A be the support 
of Df. Let S consist of 1000 randomly generated vectors, each supported on A. For each j = 0,..., 10, 
let A j index all Haar framelet coefficients in the j th scale and let 

Rj = sup {II-Pa^ooII^, UPa^I) 2 ! : rjoo = DD*x/\\DD*x\\ too ,y 2 = DD*x/\\DD*x\\ eoo , x £ S j . (5.4) 

We also let Sj = | A D Ay |. The bar charts in Figure [5] show for each j = 0,..., 10, {sy / | Ay |} (centre 
plot) and {Rj/ |A^ |} (left plot). Note that {Rj} merely approximate the localized level sparsities {nj{, 
because otherwise, we would need to consider all (s,N)-sparse support sets instead of just one support 
set A and we would also need to maximize over all vectors supported on A, instead of just 1000 randomly 
generated vectors. Nonetheless, Figure [5] provides some indication of the behaviour of the localized level 
sparsities. 
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/ Sparsity in levels Localized sparsity in levels 



Figure 5: Left: the original vector. Centre: the level sparsities of Df at each scale. Right: the 
approximate localized level sparsities at each scale, as defined in (5.4). 


5.1 Intrinsic localization 

As mentioned previously, many of the popular frames such as curvelets, shearlets and wavelet frames are 
intrinsically localized so that their Gram matrices are near diagonal. This property has been studied for 
wavelet frames in mmm and more recently for anisotropic systems such as shearlets and curvelets 
in [23l 124] . In this section, we will show how the property of intrinsic localization can yield estimates on 
the localized sparsity term, k, and the localized level sparsity terms, fiy’s. We first recall the notion of 
intrinsic localization mm- 

Definition 5.5. Let H be a Hilbert space and let d/ = {ipj}be a frame for H. T is said to be 
intrinsically localized with respect to c > 0 and L > 1 if 

l(Vb,V>fc)l < (1 + [ Vj,fc e N. 

Given A, A C N and p € (A -1 ,1], let 


I P { A, A) = max^ \{ifk,ipj)\ P ■ 

Remark 5.1 Under this definition, wavelet frames have been shown to be intrinsically localized m 
with the parameter L being dependent on the regularity of the generating wavelets. For the anisotropic 
systems studied in [23 and ,24], the definition of intrinsic localization used is more complex than the 
definition presented above. However, the key idea of how to exploit this property to obtain bounds on 
the localized sparsity values should still be applicable. 


Remark 5.2 Given any A, A C N and p £ (L 1 , 1] , note that Lp > 1 and 


/„(A, A) < max 
je A 


E 

ke a 


c 

(1 + I j - k\) L P 


< 1 + 




da; < 1 + 


c 

Lp — 1 


So I p ( A, A) is finite. Moreover, if we let 


d = dist(A, A) := min \k — j| > 1, 


then 


I P { A, A) < 



c 

(Lp — 1 ) • d L P _1 


The main result of this section is as follows. 
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Proposition 5.6. Let 'P be a Parseval frame which is intrinsically localized with respect to c ~ 1 (to 
simplify the amount of notation only) and L > 1 and let D be the associated analysis operator. Given 
any N = (N k ) k=1 G N r and s = (s k ) k=1 £ N r , let s = si + ■ ■ ■ + s r and N 0 = 0, 

B = supH^Pa-D) 1 !! : A c [N r ], |An {N k - iV fe _ 1 ]| = s fc } < oo, 


and 

B' = sup{\\(P A DD*P A y\\ too ^ eoo :Ac[N r ],\An(N k -N k _ 1 ]\ = s k }<oo. 


Le t dj _ k — 
Definition 


dist(Aj, Afc), and recall the definition of localized sparsity and localized level sparsities from 
4-1 Let p £ (A -1 , 1], Then, 


(i) 


k(N, s ,p) < s ■ max |(i? p (l + 1 /(Lp — l))) 1 ^ 1 , B p ( 1 + 1 /(Lp — 1))| 


(ii) For j = 


Kj( N, s,p) < Sj ■ max < B p ^ 1 p ^ 2 ' > ( 'Y, 


\k=1 


(Sk/Sj) 1 p/2 

^T 1 , 


1/(1—p/2) 


B p 


E S k / Sj 
d Lp -1 
fc=1 U j,k 


Proof. Note that B and B' are both finite, since there are finitely many subsets of [N r \ and for each 
subset A, {'*/b'}j e A * s necessarily a frame for its span with a strictly positive lower frame bound, (i) 
follows from taking the maximum of (i) and (ii) of Proposition 5.7 over all A subsets with an (s, N)- 
sparsity pattern and plugging in the estimate of I p from Remark 5.2 (ii) follows from (iii) and (iv) of 
Proposition |5.7| □ 


Proposition 5.7. Let p £ (0,1] and let A C N with |A| = s. Then, for all g £ 1Z(D* P A ), 

(i) if\\Dg\\ p = 1, then \\Dg\\ p p < || (P A D)i ||%(A, N)* 1 "*/ 2 ; 

(ii) if\\Dg\\ too = 1, then \\Dg\\^ p < || (P A D)^ || P / p (A, N)s. 

Let {A jYj=i he a partition for N, and let A j = Aj n A and Sj = |Aj|. Then, for all g £ 1Z(D* P A ), 

(iii) if ||Hg ||^2 = 1, then 


r 

\\PK n Dg\\ p p < ||(P A ^) t f^ J p (A m ,A„) s ^ /2 , n = 1.r; 

m—1 


(iv) if HDgll^oo = 1, then 

r 

\\Pa p D 9 \\ p p < ||(P A £>D*P A ) t ||J ao _^ 00 ^/ p (A m ,A„)s m , n = 1,..., r. 
Proof. For (i), suppose that \\DD*P A x\\ p = 1. 


m—1 


\DD*P a x\\ p £p = Y 


ke N 




j£ A 


< YY1 N p 

fceNjeA 


= Y \ x i | P I l P - ™ a A X!I ^ k ' ^') l P ) • H \ x i I 5 

ieA fee A \ fceN / je A 

< (max^ |(^fc,V'j)| P ) • lkl| P 2 ■ |A| 1_p/2 

\ je keN J 

< l^fe’^')| P ) •||(PA^) t ||"-||^PAX|| P 2- S 1 - p/2 ) 


(5.5) 
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where we have applied the Cauchy-Schwarz inequality in the penultimate line. Therefore, 

|| DD*P A x\\ p p < I(A,N)||(P A £>) t HV- p / 2 . 

To show (ii), if we instead assume that \\DD* P A x\\ iao = 1, then since 

\\DD*P A x\\ p 2 = \\DD*\\ p 2 \\x\\ p 2 < \\DD*\\ p 2 s p ' 2 , 


by plugging this into the last line of (5.5), we obtain 

\\DD*P a x\\ p p < /(A,A)||(P A P>)tf s. 

The proof of (iii) is similar to the above: if ||D*P a .t|| = 1, then for each n = 1,..., r, 
\\P An DD*P A x\\ p £p < Y ( max Y \(^k,^j)\ P ] ' Y 


, je 

m—l V fee A 


je a„ 


< Y H \ p ) • II-Pa^II^ ■ l^l 1 p/2 

m—l jGA m m feGA J 

< \\{p.Dn p -±i^K)-s^ 2 , 

m—l 

Finally, to show (iv), if \\D* P A x\\ eoo = 1, then for each n = 1,..., r, 


(5.6) 


\\P An DD*P A x\\ p (p < Y ( max Y \(i’k,^j)\ P ■ Y N 

/ je A m 


, je A„. 

TO =i v feeA 


< ^ ^ max ^ |(^fc,^)| P ) • ||PA m a:||^ • |A„ 

m= 1 j£ A m m feeA / 

r 

< \\(PADD*P A y \\ P aa ^ 00 ■ Y H&m, An) • s m , 

m—l 


(5.7) 


□ 


6 Conditions for stable and robust recovery 

Given AcN and some / € H, the following proposition presents conditions under which one is guaran¬ 
teed robust and stable recovery up to ||P^P/|| £1 . 

Proposition 6.1 (Dual certificate). Let f £ TL. Let A C N be such that |A| = s. Let W = TZ(D*P A ). 
For reN, let q = }y=i ^ (O’ l] r an d ^ {^fc}fe=i be disjoint subsets ofN. Let f2 = fii U • • • U Ll r . 
Suppose that 

(i) || QwV*(qf 1 Pn 1 ® • ■ ■ © qf 1 Pn r )VQw - Qw|| < j, 

(ii) sup ieN \\P {j} DQ^V*(q)) 1 P ni 0 • ■ ■ © qf 1 Pn r )VQ(^D* P {j} \\ < §, 
and that there exists p = V*Pqw and L > 0 with the following properties. 

(iii) \\D* P A sgn(P A Dx) - QvvpII < g 1/2 /8, 

(iv) ||P a DQvvP|Loo < 1/2, 

(v) |MI*» <!/«■ 


Let y £ f 2 (N) be such that \\PnVf — y\\ < 5. Then, any minimizer f G TL of (3.2) satisfies 


f-f 


H 


<5 (V 1/2 + V/«) + \\p A Df 


U 1- 


( 6 . 1 ) 
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Proof. Since D is an isometry, D*D is the identity on TL. Given any g eM, 

Qw 9 = QwD*Dg = Q^D*PiDg, 


( 6 . 2 ) 


since Qyy is the orthogonal projection onto TZ(D*P A ). So, using the assumption that 11-011^^2 < 1, for 
any g eW, 

lid'll < IIQwdll + llQwdH < IIQwdll-H + \\ P A D 9\\ e - 
Now, let h = f — f. To bound ||/i||^, it suffices to derive bounds for ||<3yy/).|| and IIT’a^^I 


P-' 


Let Vn,q = Qv\>V*{q 1 1 Pq. 1 ® • • • ® q r 1 Pn r )VQw- We first observe that (i) implies that Vh iq has a 
bounded inverse on Qw(T-L), with 


and 


\(QwV*{qi 1 Pq 1 ® • • • ® q r 1 Pn r )VQw) 1 

(qi 1/2 Pn 1 ®---®q~ 1/2 Pn r )VQ w < \j\- 


4 

U-+U — 3’ 


Observe also that 

lift Vft|| < ||PaVf - y\\ + \\P n Vf - y\\ < 26. 
By applying the above observations, we have that 


\Qwh\\ = V n q Vh, q /i 


< 




H 




Qr Pn r )V(h — Q w )h 


1/2 6+ \j\ ( ( h 1/2 Pn 1 ®---®qr 1/2 Pn r )VQw h 


Also, by (ii), 

(?r 1/2 ft i 


>q~ 1/2 Pn r )VQhh = (<7]~ 1/2 ft 1 


< sup 
j eA= 


fe- 1/2 Po 1 ffi • • • ® q- 1 / 2 Pn r )VQhD*e 3 


>q- 1/2 Pa r )VQ^D*PiDh 

fh 


P£Dh\„ < \l-WP£Dh 


< 2 


U 1 - 


pi- 


Plugging this into (6.3) yields 


\Qwh\\ u <2(q 1/2 5 +\\P^Dh\\ t ^j . 


So, to bound ||/i||^, it suffices to bound ||P^-Z)h|| £1 . 
Observe that 


(6.3) 


(6.4) 


\\Df\\e = \\P£ D (f + h)\U + II PaDU + h)\\ e 

> \\ P ADh\\ el - \\P£Df\\ tl + \\P A Df\\ 1 + Re(P A Dh, S ga(P A Df)} 
= \\P£Dh\\ el -2\\p£Df\\ ei + \\Df\\ 1+ Re(P A Dh,sgn(P A Df)). 


Since / is a minimizer, we can deduce that 


\p- 


\\PkDh\\ £1 < \(P A Dh,sgn(P A Df))\ + 2\\P£Df\\ 

Using the existence of p = V*Pq,w and recalling that Qyy = Q^D*P^D from (6.2), we have that 


\(P A Dh,sgn(P A Df)}\ = | (h, D* sgn(P A D f)}\ 

< I (h, D*sgn(P A Df) -Q w p)\ + \{Kp)\ + \(h,QwP)\ 

< ||Qw^|| w ||L , *sgn(P A L ) /) - QwpW-u + \\PnVh\\ e \\w\\ e + \(D*P^Dh,Q^p) 

~ ^8~ IIQw^II-h + 26L y/n + - ||P A 

< <5 ^- + 2 Ly/r^J + -\\P^Dh\\ (1 , 
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where we have used the bound on HQvv^-ll-w from (6.4) to obtain the last inequality. Therefore, 

< 8 \\ P ADf\\ e + 6 (1 + 8 Lytf) . 


Finally, combining this with (6.4) yields 


m 


< 5 (2 q- 1 ' 2 + 3 (1 + 8-LVk)) + 16 \\P A Df\\ £1 . 


□ 


Pro position 6.2 (Dual certificate for the unconstrained problem). Consider the setting of Proposition 
and assume that conditions (i)-(v) are satisfied. Let a > 0 and suppose that f G TL is a minimizer of 


6.1 


inf a\\Dg\\ t i + \\PnVg - y\\ p , 


where y G £ 2 (N) is such that \\PqV f — y\\ < 5. Then, 

2 


/-/ 

Thus, if a = qS, then 


fz — — I - ck ( —— + L\J k'] + S ( —— + L\Jh ) + \\P A D f I 
n a y^/q J \^q 1 11 1 


/-/ 


H 


< S + L\[k+ sfqL 2 H^j + \\P^Df\\ e . 


Proof. Let h = f — /, just as in Proposition 


6.1 


\\h\\ H ^ \\Qw h \\n + \\ P A D h\\ e i 

and it sufhces to bound the two terms on the right side of the inequality. We first consider HQyy^ll 


By applying assumptions (i) and (ii), we can proceed as in the proof of Proposition 6.1 to derive 


H- 


\\Q W h\\ n < ^ (jlq~ l ' 2 \\P a Vh\\ p + l\\P£Dh\\ e ^ . 


(6.5) 


Then, by letting A = 


we have that 


V - PnVf 


p 


and observing that 


\\PnVh\\ £2 < \\PnVf-yWe 


PnVf - y 


<6 + A, 


WQwHn < g + />v ) + 


< 2 ( + \\P£Dh\\ p 


Vv 


( 6 . 6 ) 


To bound ||P^D/i|L, first observe that 


Df 


P 


>a\\P A Dh\\ (1 - 2a\\P^Df\\ el +a||D/||^ + aRe (P A Dh,sgn(P A Df)) 
+ \ 2 -\ 2 + \\y-PnVf\\%-6 2 . 


Since / is a minimizer, it follows that a 


Df 


+ A 2 < a\\Df\\ e + ||y - Pq.V f\\% and therefore, 


a\\PiDh\\ p + A 2 < 2a||PiD/|| /1 + a\{P A Dh,sgn{P A Df))\ +S 2 . 


(6.7) 
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In the same way as in the proof of Proposition 6.1 we may apply the properties of the dual certificate 
p = V*Pqw to bound \{P/\Dh, sgn(P A D/)}|, so that the following holds. 

|(P A P>/i,sgn(P A .D/))| < ^-WQwHu + \\ p nVh\\ e2 \\w\\ (2 + h\P^Dh\\ el . 


By inserting the bound from (6.6), recalling that ||PnF/i ||^2 < 6 + A and that ||w ||^2 < Lyfk, it follows 
that 


|(P A Dpsgn(P A .D/))| < (6 + X) Q +Ly/itj + ^\\P^Dh\ 


e 1 ' 


Plugging this bound into (6.7) now yields 


This implies that 


+ 2 \\PA D h\\ii < 2a||P A £)/||^ 1 + a(5 + A) ^ - + Ly/n^j + 5 2 . 


A 2 — a ( - + Lyfn ] A — 5 2 — a6 ( — + L\fk ] < 0 


( 6 . 8 ) 


and by applying the quadratic formula and observing that A > 0, it follows that 

r/4 + PayP + ^/(a/4 + Lay/ k) 2 + 4 (5 2 + a<5(l/4 + PyP) + 2a P^Df ^) 


A < 


Note that ||• ||^ 2 < ||-||^i, and so, 

A < a/4 + Layfk + 6 + yj a5(l/4 + Ly/n) + yj2a\\P^Df\\ (1 
< a/4 + Layfa + 5 + a(l/8 + LyfR,/ 2) + 6/2 + ^2a\\P^Df\ 


t 1 


(6.9) 


3a 


35 


— ~(l/4 + Ly/n) + — + y/2a\\P^Df\\^, 


2 v ' 2 

where the second inequality comes from the fact that yfab < (a + b)/2 for any a, b > 0. We also know 
from (6.81 that 

llP^IU < 4||p^P/|| fl + (6 + A) + 2 Ly/Ti'j + 


a 


By combining this with the bound from (6.6), 

\\h\\ H < WQwHh + |I P A^|| 




^ U\\PkDf\\ tl + 6 + 2 Ly/K^j + 

4 + (4 +3 P +2 ^))a 


2<P\ 

a ) 


y/q \Vv 

y ||P A P/|Li + 5(1 + Py/~K) 4—— 4—— + f —— + 1 + Ly/7/\ A. 

f a y/q \y/q J 


Recalling ( |6.9[ ), 
1 


V~q 


1 LxJTi ) A < ( —— + 1 + Lx/, 


V* 


+ 5 + yj a||P A £l/|| fl ^ 


P ( —p- ~V 1 + L\[k j (a(l + Lyfri) 4- 5) 4-)- ||P A Z }/||^ 1 + a(l + PyftP) 

W q J q 

< a + 1 + Ly/Tc'j + 6 + 1 + Ly/isJ + ||P A P/||^i. 


Therefore, 


h\\ H < -hi 


a 


~+ 1 + Ly/HJ + 6 + 1 + ) + \\PtDf\\ n . 


□ 
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7 Overview of the proof of Theorem 4.4 


The remainder of this paper is focussed on the proof of Theorem 4.4 and we begin by setting some 
notation which will be used throughout. 

Let V,D £ B(H, £ 2 (N)) be isometries. Let f £TL. For r £ N, M £ N and N £ N, let N = {2Vfc}fc =1 e 
M = {M k } r k=1 ef,s = {skV k=1 eN r ,m = {m fe }]~ =1 e with 

• 0 = M 0 < Mi < • • • < M r =: M, and let T k = {M k _\, M k ] D N and Q k ~ Ber(^, T k ). 

• 0 = N 0 < Ni < ■ ■ ■ < N r =: N, and let A k = {N k _i,N k \ D N for k = 1 ,...,r — 1 and A r = 

(7V r _i, oo) D N. 

• m k < M k — Mfc_i and let q = {qj}' j=1 £ (0, l] r with q 3 = mj/(Mj — 

• s k < N k — N k _i and let A c [N] be such that |A| = si + • • • + s r =: s and A*. = A k n A, |A*,| = s k . 

Let W = K(D*P a ). 

For some p £ (0,1], we will write n = {«j}^ =1 with ni = ki( N,s,p) and ki = k;(N,M,k) for each 
l = 1,... ,r. Let K m i n = rmin[ =1 Ki and ft max = rmax[ =1 «/. Note that K m ; n < Yl'i =i K j — K max- 
We also define T £ S(^ 2 (N),1! 2 (N)) such that given any x = (xj)j^ £ £ 2 (N), 

Xj 


Tx = y, y 3 = 


j £ A fc , k = 1,..., r. 


max {l, y/rn k ] 

Observe that T is an invertible operator, ||T|| < l/y'ftmin and ||T _1 || < 


7.1 Outline of the proof 


To prove Theorem 4.4 it suffices to show that conditions (i)-(v) of Proposition 6.1 are satisfied with high 
probability whenever the sampling scheme is the multilevel sampling scheme 0 = described in 

Theorem |4.4| To this end, we first remark that it has become customary in compressed sensing theory 
to deduce recovery statements for uniform sampling models by first proving statements based on some 
alternative sampling model which is easier to analyse. One approach, considered in mm is to consider 
a Bernoulli sampling model, defined below. 

Definition 7.1. Let M = {M k } k _- l with 0 = Mq < Mi < • • • < M r . Let := 0® er U • • • U S2® er , 

where S2® er := {Sj ■ j : j £ F^} with Tfc = N fl (M k _i, • • • , M k \, and 5 j are independent random variables 
such that P(<5j = 1) = m k /(M k — M k —i ) and P(<5j = 0) = 1 — m k /{M k — M k - 1 ). The Bernoulli 
sampling set described will be denoted by S2(? er ~ Her(m k /(M k — M k -i),T k ) and we say that 

^Mm = ^? er O • • • U is a Bernoulli (M, m) -sampling scheme. 

As explained in 0 II.C] (see also OS), the probability that one of the conditions of Proposition 
6.1 fails for Q = 12m, m chosen uniformly at random is up to a constant bounded from above by the 
probability that one of these conditions fails under the Bernoulli (M, m)-sampling scheme, SI = S2^ r m . 

So, to prove Theorem |4.4| it suffices to show that conditions (i) to (v) of Proposition 6.1 hold with 
probability exceeding (1 — e) with satisfying the following assumption. 


Assumption 7.2. Let C = (log(e 1 ) + l) log (Mq 1 Kma*\\DD* 




3 ) and 


B = max < II DQ^D 




\ 


\DQ W D* 




max - 
2=1 


|Tv, DQk(d*p&.)D*P h t | 




Let 


M = min < * £ N : max 
l 3>i 


P[M]VD*ej \\ e2 < 


q 

S-\/Prnax 


Suppose that 


max 

j>i 


Q'R(D-‘P [N] )D*e j 


< 




f 5 
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(a) 


DQ W V*P^VQ W D 


t 2 ^t 2 

*rp— 1 I 


^\/& lo S 1/2 (4 q^y/K^M). 


(b) || dq^v*p [m] vq w d*t-% 2 ^ < 

(c) For k = 1,... ,r, 

r 

Qk'ZV'rZB ^/iN,M(M) ki. 

l=i 

(d) For k = l,... ,r, qk > £ qk with {<Zfc}£ = i satisfying 

r 

1 £ rB 2 ^(C 1 - i^N.M ( k ,j)Kk, J = 

fc =i 


Note that this assumption is strictly weaker than the assumptions of Theorem |4.4[ a nd by showing 
that conditions (i) to (v) of Proposition 6.1 we will prove that the error bound ( |6.1[ ) holds for one 
support set A. So, by ensuring that the conditions of this assumption hold over all A sets which are 


(N,s) sparse patterns (as required by Theorem 4.4), we can conclude that (6.1) holds for any (N, s) 
sparse support sets. 

Under this assumption, 


(j9]will show that conditions (iii) to (v) of Proposition 6.1 are satisfied with probability exceeding 
1 - 5e/6; 


110 will show that conditions (i) and (ii) of Proposition 6.1 are satisfied with probability exceeding 


1 - e/6; 

lj8] will present some preliminary results for use in m and flT)| 


The proof of Corollary |4.5| Once we have shown that the conditions of Proposition |6.1| hold with 
probability exceeding 1 — e, the conclusion of Proposition |U2] automatically follows and we may conclude 
Corollary |4.5| 


8 Preliminary results 

In this section, we present four propositions which will be applied to show that the conditions of Proposi- 
are satisfied with high probability under the conditions of Theorem |4.4| with a Bernoulli sampling 
scheme. The results in this section are derived using Talagrand’s inequality and Bernstein inequalities 
(for random variables and random matrices) which we state below. 

Theorem 8.1. (Talagrand 1321 Cor. 7.8]) There exists a number K with the following property. Consider 
n independent random variables X) valued in a measurable space f l and let F be a (countable) class of 
measurable functions on f2. Let Z be the random variable Z = sup J2i< n an d define 


tion 6.1 


S=sup||/|| 00 , U=su P E ^/(X 4 ) 2 
f e.? 7 faJ 7 1 


, i<n 


IfK(f(Xi)) = 0 for all f £ F and i <n, then, for each t > 0, we have 


P (| Z -E( Z )|> ( )<3e x p|- ; iA l06 (l + T7T |_) 


where Z = sup /e ^ | £i<n f{Xi )|. 
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Theorem 8.2 (Bernstein inequality for random variables |S]). Let Z±, ..., Zm £ C be independent 
random variables with zero mean such that \Zj\ < K almost surely for all l = 1 and some 

constant K > 0. Assume also that P \ z j\ 2 < <x 2 for some constant a 2 > 0. Then for t > 0, 


M 


E^ 

5 =i 


> f 


< 4 exp 


^ 2 /4 \ 

. a 2 + Kt/(3V2)J ' 


If ..., Zm G M are real instead of complex random variables, then 


M 


yz, 

3 = 1 


> t 


< 2 exp 



^ 2 /2 \ 
+ A't/3 / ' 


Theorem 8.3 (Bernstein inequality for rectangular matrices (40]). Let Z\, ... , Zm £ C dlXd2 be inde¬ 
pendent random matrices such that EZj = 0 for each j = 1,..., M and ||^j || 2 _.2 < K almost surely for 
each j = 1,..., M and some constant K >0. Let 


a := max ■ 


M 

E E (^*) 

3=1 




M 

J2h z ; z 3) 

3=1 


t 2 ^ri 2 . 


Then, for t > 0, 


M 

E^ 

3=1 


> t < 2 (di + d 2 ) exp 


-t- 


72_\ 


t 2 ^rl 2 


Kt/3 J 


Proposition 8.4. Let g £ W and let a > 0 and 7 £ [0, 1]. Suppose that 


I TD(Q w V*P [m] VQ w - Qw)D*T~ 1 || /2 _^ < a/2. 


( 8 . 1 ) 


Then 

P (|| TDiQwV^qf'Pn, © ■ ■ ■ © q^PnJVQw - Q w )g\\ p > a\\TDg\\ e2 ) < 7 


provided that 




/3 V r 

VrB log f - j E^n.m (M)«i < a, k = l,...,r, 

'7/ 1=1 

( 8 . 2 ) 

and 




/ 3 \ r 

rB 2 log ( - j E(C X - !)Mn,m (M)«fc < a 2 , l = l,...,r, 

' k-1 

(8.3) 

where 

r 

B 2 = \\DQ W D*\\^^ max^ ||P Ai £>Q w £>*PaJ < 00 _^o.. 

t—1 



Proof. Without loss of generality, assume that ||7\D </||^ 2 = 1. Let {5j}^ =1 be random Bernoulli variables 
such that P(5j = 1) = qj where qj = qk for j = Mk~ 1 + 1,..., M*. Then, 


TDiQwV^q^Pn, © • • • © qf l Pn r )VQ w - Q w )g 
M 

= E(^ 71 ^ - 1 )TDQ w V*( ej © ej)VQ w g + TD(Q W V*P [M] VQ W - Q w )g, 

3 =1 

where © is the Kronecker product. Since D*D = I and since ( |8.1| ) holds by assumption, 

II TD(Q w V*P [m] VQ w - Qw)g \\ e 2 = \\TD(Q w V*P [m] VQ w - Q w )D*T~ l TDg\\ t2 
< \\TD(Q w V*P [m] VQ w - Q W )D*T~\ 2 ^ 2 < a/2. 
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So, it suffices to show that 


> «/ 2 j < 7 , 

where for each j = 1,... ,M, Yj = (qj' 6j — 1 )DQ w V*(ej ® ej)Vg. We will aim to apply Talagrand’s 
inequality (Theorem |8.1[ ) to obtain this probability bound. 

Let Q be a countable set of vectors in the unit ball of £ 2 (N) and for each ( G Q 1 define the linear 
functionals Ci ,02 : ^ 2 (N) —> K. by 


M 


E y i 

i=i 


e 2 


Ci(y) = Re (y, C)> < 2 ( 2 /) = —Re (?/, C), Vj/g£ 2 (N). 


Let T = {Ci ,6 = C € Gy Then, Z := sup /e ^£jii f{Yj) 



• To bound S = ma xj ||Y)||^ 2 : 

||T/||^ < qJ 1 \\TDQ w V*(e j <g> ej)Vg\\ p = qj l sup \{TDQ w V*(ej ®ej)Vg,x)\ 

IMI< 2 =i 

= q~ 1 sup \(Vg,e j ){TDQ w V*e j ,x)\ 

11 * 11 / 2=1 


For each j G Tk, 


\(Vg, ej )\ = \(VD*Dg, ej)\ < J2\(VD* P Al Dg, ej )\ < v(Pr k VD* P Al )\\P Al Dg\\ e . 

1 —1 1 =1 

Observe tha t ||r .Pg ||„ 2 = 1 implies \\P Al Dg\\ e2 < for each l = 1 ,... , r. Furthermore, by (4) 
of Corollary |5.4[ this implies that \\P Al Dg\\^ < Kiy/r. So, it follows that 

r 

\{y 9 i e j)\ < v^J 2^ vd * p ^ ki - 

1=1 


Also, if ||a ;|| £ 2 = 1, 

r r 

\(TDQ w V* ej ,x )\ 2 < V \\P Al TDQwV*ej \\ 2 2 = V — \\P Al DQ w D*DV*e j \\] 2 

rKl 

r (8.4) 

< E —II^A^OwP*|| 2 ~^2||W*e,|| 2 00 < 

i=l rKi 

where the last inequality follows because \\P Al Df\\ 2 2 < m for all / G W with ||.D/|| <00 < 1. 
Therefore, 

r 

max ||Y)|L 2 < max||£)(3vvIl*||^ 00 _ ) .^ oa v / i'E/ i N m(MW- 
J fc=1 ’ 

• To bound V = sup / 6 ^E^ii/(Tj ) 2 = sup Ce 6 E^£i |(C,^)| 2 : 

M 

v < sup^(g _1 - 1) |(e„C 5 )| 2 |(We j; Q W ^TC)| 2 

r 

< sup^ 1 - l)||l^s|& max|(We„g w P*TC)| 2 (8.5) 

C65 ^ jeFk 

r 

< E^ _1 - mPr k Vg\\ 2 e2 max 11 TDQy^V*e.j 11 2 2 . 

7 —■ Jtifc 
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By combining ||PA i -Dg , ||^ 2 < y/rni (which follows from ||TDg||^ 2 = 1) with the definition of kk , we 
obtain HPr^fosll^ < rhk■ Also, for each j £ F/., 


\\TDQ w V*ej\\ 2 e2 = V — \\PA t DQ w V*e 0 \\ 2 e2 

' r^t 
t =1 1 

= E — ||^ t BQwB*||^_, /2 ||W*e i || /00 ||P AtJ DQwV-*e i || /3 

t =i rKt 


< 


E 

£—1 


'V^ 


II dq w d* 




A\DV*e d \\ taB E \\PA t DQwD*Ai\\^ e2 \\P Al V* 


-3\\£° 


- E -\\ D QwD*\\ eoo ^ eoo E l|-F > A t A ) Qw-D*A;|| £oo ^ 00 ,U Ni M(fc, 0 

t =1 ' /=! 


where we have used, from the definition of the n) s, ||PA t -D /||^2 < ^/Ht whenever t = 1 
f £ W and ||D/|| /00 < 1. Therefore, 


II / w 

V < r\\DQ w D*\\^ tao EE^fc 1 - 1 )«^,m(*.0E ~\\PA t DQ w D*P Al \\ eaa ^, 
1=1 k = 1 t=l 

r 

< rB 2 maxy^(g~ 1 - 1) Mn,m(M) 

~ fc=i 


where we have applied 


1 ' ' 

-\\DQ W D* l^oc^ao EE UPa^Qw^PaJI^^ 

z=i t=i 

r 

< \\DQwD*\\ taB ^ ieo maxE HPa^Qw^aJ^^ =: B 2 . 

t= 1 


E 


To bound E(Z) = E J)f=i Y j 
2 

M 

E^ 


i=i 


M M 

E E H?*lk = EE 1 - 1) \\TDQwV*ej \\p\(Vg, ej 

3 =1 J=1 


< E^ 1 - 1 )II*K^I& max || TDQ W V\ 

£~i 3&Tk 


3\\P- 


This is the same upper bound as obtained in (8.5), so from the bound on V, we have that 


E 


Finally, by Jensen’s inequality, 


/ 

M 

E^ 

') 

V 

i=i 

£ 2 / 


< B 2 maxEOE - 1 )Mn,m 


k =1 


M 


/ 

M 

E 

r 

E y t 

< 

E 

E^ 


B 2 max') ^(q) 1 - 1) M (fc, Z) fo. 

i=i 

\ 

V 

i=i 

£2/ \ 

1 fc=i 


Let C = max{Ci,4(72/a}, where 


Ci = ||PQwA ) *||f=o^^ooV / ^maxEA i N.M(fc,0 K i I 

fc = l Z -■' 

Z =1 
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and 


C 2 = rB 2 ma,x^2(q k 1 - 1) Mn,m(M) «fc- 


k =1 


Note that C > 5, aC/4 > V. Suppose that C 2 < a 2 /16, then E (Z) < a/A, since (E (Z)) 2 < C 2 . By 
applying Talagrand’s inequality with the upper bound of C > S and E (Z) < a/4, 


<P(Z> ^+E(Z)) >P(|Z-E(Z)|< | 
Ca /4 


< 3ex P ^7^ lo g ( 1 + 


AKC 


^ 3 eXp '-4 KC l ° g 


V + Ca/A 


- 3 eXp( -4iW l0g(1 + 


Ca/A \ 
Ca/A + Ca/Aj 


where K is the constant from Talagrand’s inequality. So, P (Z > j ) < 7 provided that 


Cl0Sl ,7j -4 K l0g 


as well as C 2 < a 2 /16. Therefore, the require result would follow provided that 
\\DQyyD* 11^00^.^00 \pr log ^ ^ lo g (f) , 


k = 1 , • • ■ ,r, 


and 


(’)» 

v fc=i 


rB log - > - 1 ) // NiM (fc> 0 < tp min / T lo g o 


16 


K 




Proposition 8.5. Fix g € H and let a > 0 and 7 € [0,1]. Suppose that 


□ 


DQhV*P [M] VQ w D*T-^\ e2 ^ < = 


( 8 . 6 ) 


Let P = ||T ) Qvv-D*||^oo_ > ^o 0 - 


M = min / i £ N : max2 v / K max • ma xq. 1 • Pr M i VD*ej . 2 < 

7>i k— 1 L J 


= rmax{/c i }J =1 . 


T/ien M is finite and 

P(|| PiDQ^V^q^Pni 


' q r 1 Pn r )VQ w g | |^oo > a \\TDg\\p) < 7, 


provided that 


and 


Vr B log 9 a - 1 X^n,m(M)kz ^ a, k=l,...,r 


1 = 1 


rB 2 log [ — ] - 1)Mn,m (k,j)k k < a 2 

V ' / A=i 


j = l,...,r. 


Proof. Without loss of generality, assume that ||TPg || £ 2 = 1. Let {Sj}^ =1 be random Bernoulli variables 
such that P(<5j = 1) = qj where qj = qk for j = Mk~ 1 + 1,..., M*,. Observe that 


Qw = QwD*D = Qfi,D*P£D 


_L n* TD -L j 
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so we have that 


PkDQ^V*{q^P ni 

M 


q-Pn r )Vg 


= - l)P£DQbD*P£DV*(ej ® e :l )Vg + P^Q^V*P [M] VQ w D*T- x TDg. 

3 =1 

Since ||P^ DQ^ v V*P[m]VQwD*T~ 1 H^^oo < ck/ 2, it suffices to show that 


M 


1(5 i “ l) P A P <9w^*( e .j ® 
j=i 


> a /2 < 7 . 


For each i £ A c and j = 1,..., M, let 

Z) = {qj 1 5 j ~ 1)( p a D Qw v *( e j ®£j)VDg,ei). 

For each i £ A c , we will first apply Bernstein’s inequality (Theorem |8.2| ) to consider upper bounds for 
> a). Observe that 


(|e".zj |>4 


E 


(|zj| 2 ) = (q- 1 - l)\(P£DQj v V*(e j ®e j )Vg,e i )j 2 
= ( qJ 1 - l)\{V*e 3 ,Qk,PkDem{ejiVg)f 


Let B = UDQ^D 


w D *\\#*^to 


Then, 


M r 

J2H\Z}\ 2 )< sup J2^k 1 -mPr k VD*a\\U\Pr k Vg\\ 2 P 

J=i i“iui < B tP[ 

r 

<B sup Y.^k 1 -^PT k VD*)\\P Th VD*a\\ tao \\P Tk Vg\\l 
IMbi<B fc=1 

r 

<B sup Y,^k 1 ~^)^Pr k VDnY.\ a i\\\ p r^ D *ei\\^\\Pr k Vg\\% 
ll“lh 1 -- B fc=i ;eN 

r 

= B sup El^lE(C 1 -lM p r fc V J D*)||Pr fc ^*e ; ||^||Pr fc ^||, 2 2 
ll«lhi<- B ; e N k=1 


< i^sup^fo* 1 - l)^P rk VD*)\\Pr k VD*e l \\ eoo \\P rk Vg\\% 

' 6N fc =i 
r 

< B 2 max^2{q^ 1 - l)l4i M (k,j)\\Pr k Vgf e 

3 k =1 


< rP 2 maxV(g fc 1 - 1)^n.m (k,j)k k -■ cr 2 , 

7 = 1 L ' 

fc=l 


(8.7) 


where the last line follows because ||TPg ||^ 2 = 1 implies that ||P Afc .Dg ||^ 2 < yjrn kl and by definition of 
Kfc, || p r fc ^ff ||?2 < rk k . 

Also, we have that 

|2J| < q; 1 \(V*e j ,QhD*e i )\\(e j ,VD*Dg)\ 

r 

< ma xq- 1 g(P rk VQk,D*Pk) £ g(Pr k VD* P Ai ) ||P Ai P 5 ||,i 

Z=1 

r 

< \\DQwD*\\ eoo ^ eao maxq^ 1 ^2/j. NM (k,l)\\P Al Dg\\ el 

~ 1=1 

r 

< Vr\\DQk,D*\\^^ eoo max?/ 1 ^MN,M(fc, 0 ^ = : 

1 l=i 
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where the last line follows because ||Pa ( L)g\\p < y/rni along with (4) of Corollary 5.4 implies that 
\\ p M D 9 \\e ^ Vrm- 

Let TcNbe such that 

M 


P I sup 
ie T 


E^ 


3 =1 


> a = 0 


and suppose that |T C | < M. Then, by Theorem 

M 


8.2 


P ( sup 
ie A c 


M 

E 2 i 

J=i 


a 


> - < P sup 


ieT c 


E 2 J 

j=i 


and the union bound, 


> ? I < 4 |T C | exp ( — 


a 2 /16 \ 

r 2 + A'a/(6\/2) / 


which is true provided that 


, (4M\ 2 a 2 

108 S 32’ l0g 


(t) 


K < 


which are simply the assumptions of this theorem. 

It remains to show that such as set T c exists: First note that \\Dg\\ < ||T _ 1 1| ||T.D< 7 || < ^/K max with 
K mn = rmax{/Cj}l =1 . So, using the fact that D and V are isometries and hence of norm 1, notice that 


M 


E 2 i 

i=i 


= K^T 1 - l)PiDQ^V*(q^P ni © • •• © q^Pn r )Vg, <*> 
< \\Dg\\ e || DQwV*^ 1 ^ © • • • q-^VQ^D* p£ ei 


£ 2 


— yj ^max max q k \ | P (M] V Q^D* a || £2 —>• 0 


as i —>■ oo since P[m]V * s °f finite rank. Thus, for a > 0, it suffices to let 

T c := jz G N : V K max maxg” 1 \\P[M]VQw D * e i\\i 2 > aj 

which is a finite set. To conclude this proof, observe that |Y C | < M < oo. 
Proposition 8.6. Let a > 0 and 7 £ [0,1]. Suppose that 


QwV*P(- m] VQ w 


h=>u 


< a/2. 


□ 


( 8 . 8 ) 


Then, 

provided that 


{\\QwV*(qi x Pn 1 © • • • © Q r 1 Pn r )VQw - QwV*VQw\\ n n > a) < 7 


4 M 


qk 


>a 2 log - E^n.m(M)kz, k = 1 ,..., r. 


Z=1 


M 


Proof. Let qj = q for j G IV Let {<5j }^ =1 be Bernoulli random variables such that P (8j = 1) = qj. 

|| QwV*(q^P ni © ■ ■ • © q^PaJVQw - Q W V*VQ W || < 

M 

E^” 1 ^ - 1 )DQ w V*(e j ®e j )VQ w D* 


E^j 1 S j QwV*(e j © ej)VQ w 

3=1 


< 


< 


3=1 

M 


QwV* P\m]VQw 


J2W l6 3 - 1 )DQ w V*(e j ®e j )VQ w D* 
3=1 


x/2. 
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Let 


M 


U = 1§ j - l )( e i ® e i)- 

3=1 

Then, for each J £ N, \\DQwV*UVQwD\\ is bounded above by 


\P [J] DQ W V*UVQ W D*P [J] \\ + 


* n-L 


DQ W V*UVQ W D*P r 


[J] 


Pfr ] DQ w V*UVQ w D* 


< \\P [j] DQ w V*UVQ w D*P [ j] \\+2 P^DQw 


\m 


< \\P [ j ] DQ w V*UVQ w D*P lJ] \\+2q 


-1 


P[j]DQ w 


where q = min {<y ? -} r _. Note that since DQy\> has finite rank, 


P\j]PQw 


-» 0 as J —>• oo. Let 


M = min | j £ N : 


P[j] DQw 


< qa 


}• 


Then, it suffices to show that 


P[M] DQwV*UVQwD* Pf 


[M] 


> a/4^ < 7 


For each j = 1,..., M, define Zj = (q- 1 5j — l)P,^DQ w V*(ej g) e^VQy^D*P^. We will aim to apply 
Theorem |8.3| to derive the following. 


M 

E z j 

3 = 1 


a , 

> t < r 


Notice that {Zj}^ =1 are independent mean-zero matrices. Let = P^DQwV*ej. Then, 

ii^ii <^ 1 lki®e i || <«7 1 ii^-ii 2 . 

To bound this, note that for each j = 1,..., M, 

||£j|| 2 = sup \(DQ w V*ej,a)\ 2 = sup \(DQ w D*DV*ej,a)\ 2 

aec“ aeC*,||a|| f 2 = l 

r 

< sup J2\( D Qw D * P ^ DV * e P a )\ 2 

aeC M ,||a|| i ,2=l l —1 

r 

< sup Y,\\P^DV*e j \\ 2 eoo \\P Al DQ w D*a\\ 2 el 

aeC M ,||a|| f2 =l 1=1 

r r 

< sup J2\\P^ DV * e 3\\U\\ P ^ D 9\\^ = SU P J2^m(^ 1 )\\ P M d 9\\ 

ffGW,||g||=l l=1 ffeW,||g|| = l ;=1 


2 

<!• 


So, since ||T>g , || = 1 implies that ||Pa,-D(/||^i < kz(N,s) for l = 1,..., r by (4) of Corollary 
that 

r 

\\Zj\\ < nra xqj 1 sup V 0 s ) == A '- 

J=1 


5.4 


we have 
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Also, 


M 

E E (^) 


= sup 

M 

E^” 1 “ X )((0 ® lj) X , (0 ® lj) X ) 

2=1 

p~ 

\\x\\ £ 2=1 

2=1 


= sup 
11 * 11=1 


M 


E 7 '/ 1 “ l )(tptj)(tp x )(tp x ) 
2=1 

M 


< 


< 


sup fnMx{(g fc 1 -l)||a|| 2 }) |(e i ,y£)*P [J g ] a;) 

11*11=1 V 1 > J j=1 1 

sup (mLtUq^ 1 - 1) HCfcll 2 }) liy^lllkH 
11*11=1 \ fc=i l J / 


{(?/ - !) IlCfef} 


< K. 


Thus, by Theorem |8.3 


P (\\QwV*{q 1 1 Pn 1 ® • • • © q r 1 Pn r )VQ w - Q W V*VQ W \\ H ^ H > a) < 4 M exp K ^^a/o ) ~ 7 


provided that 

log (t) 

which are implied by the given assumptions. 


K S 16’ log 


4 M 


K < a/2, 


(8.9) 


□ 


Proposition 8.7. Let a > 0 and let 7 £ [0,1]. Let 


M = min li £ N : sup \\P [M] VD*ej\\ £2 + Q n ^ D , P[N]) D*ej 

j>i 


P 


<J 5 4 


Then M is finite and 


P (sup \\P {j} DQ^V*(q^P ni 0 • ■ ■ © q^PnJVQ^D'P^W > 7 ) < 7 
VieN 4/ 

provided that for each k = 1,..., r and each j £ N, 

1 >B 2 max( 9 “ 1 - 1) Mn.m (k,j) log 
2=1 



where B = ||Z3QvvP ) * 


Proof. Let {8j be Bernoulli random variables such that P(<5j = 1) = q-j where qj = qk for j = T k . 
Observe that for each j £ N, 


P {j} DQ^V*(qf x Sh © • • • © q-^VQ^D'Pw || 

M 

= - l)P{j}DQwV*(e k ®e k )VQbD*P {j} + P {j} DQ^V*P [M] VQ^D*P {j} 

k =1 


< 


M 

E'- 1 

k =1 


(q k 1 S k - 1 ) \{VQw D *ej,e k )\ 


+ 1, 
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where we have applied ||y|| = ||Z?|| = 1 in the last line. For each j £ N and k = 1 define 

Z 3 k = (q k 1 5k — 1) | (VQwD* e j, ej.) | . To prove this proposition, we need to derive conditions under 
which 


P ( sup 

\je N 


Y. z l 



( 8 . 10 ) 


We first seek to apply Theorem 


to analyse P ^ 


7 i i 

Z^fc=1 ^ 4 


for each j £ N. Observe that 


< sup max(q l - 1) ^{P Tl VQ^D*P {j }) =: K, 
jew 1=1 


and 


M M 

^E(|^| 2 ) = ^(g- 1 - 1) |(Fg^ ej ,e fe )| 4 

k= 1 k =1 

< nmx(^ _1 - l) n 2 {P Tl VQwD*P{j}) \\VQ^ v D*ej\\ 2 p 

< supmax(gf 1 - 1) n{P Tl VQ^,D*P {j} ) 2 =: cr 2 . 
jew * =1 


Thus, by applying Theorem 


r 

E z i 



< 2 exp 



1/32 \ 

+ K /12 J ' 


In order to use this to bound (8.10), 
be such that T c is a finite set and 


we will proceed as in Proposition 8.5 to show that there exists T 


SU P \\ p {j} D Qw v *(Qi ^1 

jer 


>q- 1 n r )VQ^D*P {j} I 


2 > 4 I 0- 


Let q = min{gfc : k = 1,... ,r}. Since P[m]VD* and Qk(d*p [n] )D* are both of finite rank. 


\P {j} DQ^V*{q^£h © • • • © q- L n r )VQ^D*P {j} \\ e2 ^ 2 
<-\\P[M]VQwD* e j \\{2 < ~ (\\P[M]VD*ej\\ e2 + Qn(D*p [N] )D*ej ^ ->• 0 


1 

q" l "" J "" ‘" ll " q 

as j —► oo. Therefore, it suffices to let 


\\P [M] VD* ej \\ i2 + Qn(D*p [N] )D*ej 


p 




which is a finite set. Observe also that |T C | < M < oo. Therefore, by applying the union bound, 


sup \\Pj j xDQ^V*(q 1 

ieN 


q r 1 n r )VQy i> D*P{j}\\ > - ) < 2Mexp ( - 


1/32 \ 

+ K /12 J ' 


Thus, it suffices to let, for each k = 1,..., r and each j £ N, 

1 Z (C 1 - 1 )v{Pr k VQwD*e 3 ) 2 log 
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Let B = 11 Z?<5w e j 11 - Finally, the following observation concludes the proof of this proposition. 

(C 1 - l ) K p T k VQw D *ej) 2 log 

<B sup {q- 1 - l)fi(Pr k VD*)\\Pr k VD*a\\ eao log ( — ) 

\n\ (1 <B V 7 / 

<B sup Kl (C 1 - 1 )v( p r k VD*)\\P rk VD*ej\\ eoc , log ( — ) 

IMI/i < B J^ \ 7 / 

< B 2 max^fc 1 - 1 )Mn,m (k,j) log 
3 = 1 




□ 


9 Constructing the dual certificate 

This section will show that, with high probability, one can construct p G 7 Z(V*Pq) which satisfies 
conditions (iii) to (v) of Proposition 6.1 if is a Bernoulli multilevel sampling scheme satisfying 

Assumption |7.2| 


As explained in [T], the sampling model of = fli U • • • U fl r with f \ ~ Ber(q k ,Tk) is equivalent to 
the following sampling model. Q = fL U • • • £l r with 

= n^U---n^, ~Ber(g£,T fe ), k = l,...,r, j = 1, - - -, M, 

for p £ N and j such that 

(! ^ 9fc)(! - 9fc) • • • (1 - Qk) = 1 - 9k- (9.1) 

We will consider this alternative sampling model throughout this section so that we can apply the golfing 


scheme of [25] to construct the dual certificate described in Proposition 6.3 This section consists of the 
following steps: 

1. Define the dual certificate. 

2. Show that the constructed dual certificate satisfies conditions (iii) to (v) of Proposition |6.1| provided 
that certain events occur. 


3. Show that the events described in step 2 occur with high probability. 


Definition of the dual certificate Let 7 = e/ 6 . Let £ = log( 4<7 1 v /kM||DZ 1*||^ 00 _ > ^ 00 ). Define 
/iyeN, {oy}j = i and {/3j}j l =1 as follows. 


p = 8|~3i/ + log( 7 _1/2 )], 

v = l°g (&9 1 V K ma X M\\DD*\\ 

1 1 1 

9k=9k = 4 %, 

9k = Qk 

= -- = 9k, k = 1,... ,r, 

1 


1 ■ 0 

ai - a2 ~ 2 £i /2 

> — 

2 , 1 3,..., /x, 

1 

£ 


ft =02 = 4 , 0 

i ~ 4’ 

* = 3 ,. ..,p. 


For j = 1 ,..., p, define Uj : £(t 2 (N), £ 2 (N)) by 

u j = \p Qi ®---® 1 J P nt 

9\ q J r 
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Let Z 0 = -D*sgn(PA-D/) and for i = 1 , 2 , define 


Zi = Z 0 - Q w Y t , = UjVZ J _ 1 . 

i=i 

Let 0i = {!}, 0 2 = {1,2} and for * > 3, define 


©i-i U {*} ||- V*E/ i KZ i _i)||*, < a i ||T J DZ,_ 1 ||^ 2 

< ft||T£>Z i _ 1 || /a 

0 ,_i otherwise. 

y . = I Ejee, V'UjVZj.! i £ 0 < 

1 Y)_i otherwise. 

^ _ \z 0 — QwYi * £ ©i 

otherwise. 

Note that Zi £ W for each * = 1,..., /x. Define the following events. 



A: \\TD(Z i _ l -V*U i VZ i - 1 )\\ e? <a i \\TDZ i _ 1 \\ e , * = 1,2, 

Br. WPkDQbVUiVZi^W^ < PiWTDZ^Wp, * = 1 , 2 , 

^ 3 : | 0 M | > 

b a : n? = i ^ n nf =1 Pi. 

Let r(j) denote the j th element of 0^ (in order of appearance). 


Properties of the dual certificate Suppose that P 4 occurs, and let p = Y rfv ). By definition, 

and derive 

an upper bound on ||rc||^ 2 . By definition, 


p = V*Pqw for some w £ £ 2 (N). We now show that p satisfies (iii) and (iv) of Proposition 


6.1 


Z T(i) = Z 0 ~Q w J2 V*U j VZ j - 1 = (Qw-Q w V*U T{i) V)Z T(i _ 1) . (9.2) 

360(0 

1. Since D* D = /, by construction of p, 

\\Z 0 — QwpW = \\Zo — ( 5wW( J /)|| = ||^t(i/)|| 

= \\D*D(Q W — QwV*U T („)VQy i ;)Z T ( u _ 1 )\\ 

= j| T-'TDiQw - Q w V*U T{v) VQ w )Z r ^_ 1) \\ t 

Recalling the definition of (oi)f =1 and recalling that K max = rmax{fCj} and ||T _1 || < i/K max , it 
follows that 

II ^0 - Qwp\\ < || TD (Z T ( l/ _ !) - V*U t ( v )VQwZ t ( „_i)) ||^ 2 

\Z^max O r{i/) 11 BDZ T ^ l/ _ i) 11 
V 

— V^max WtdZqWp ]^[ a r(j) 

3=1 

V 

— \J K max || DD || f oo_^oo a T(j) 

3=1 

< 2 -" < 
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by our choice of v. Note that to get from the third line to the forth line, we observe that H-DZoLoo < 
\\DD*\\ toe _^ ioa , then recall from (3) of Corollary 5.4 that 11 Pa^ .Dg11< Kj for all g = D*P / \x with 
jl-Dgll^oo < 1. Therefore, 


\\TDZ 0 \\% = < II DD* 


3 =1 


TKj 




2. Recalling our dehnition of (/3j) / /=i and the estimate used in the previous step to bound || TDZ r ^ || 2 , 

V 

\\PkDQ^p\\^ < Y WPaDQwV'Ut^VQwZ^-vW^ 


3 = 1 

V 

— Y', Pr{j) \\PP^T(j-l) 11 pi 
3=1 

<\\DD % ^±i3 T{j) J lia r{i) 

3=1 *=1 


< T 1 + 


1 


2 VC 2 2 £ 


1 

C2 V ~ 1 ) ~ 2' 


3. By definition, p = V*Pq,w with w = 1 w j an d Wj = U T (j)V Z T (j_i). For each j = 1,..., u, 


Wj 


~ (^t(j)VZ t (j_i), U t (j)VZ t ^_ 1 ' ) ) Y / T ( j ) 

k= 1 \1k > 


Pn:} j> VZ T {j-i) 


— P- t U)Yj t( j)^ Pnl^ZrU-^ZrU-D) K tU) (V U t (j^VZ t (j_i^,Z t (j_i^), 


fc=l % 


where K T (j) = max 


{1/P 


T U) . u _ 


: k = 1,... 


( r|. Using (9.21, we have that 


= (U Ur^VZ^ — Z T (j_ r), Z T (j _!)) + (Z T (J _!), ^ry-i)} 
= Z T (j— 1)} + (Z T (j_ 1), ^ T (j_i)). 

Since / = D*Z3 and ||T _1 |I < K max , we have that 


| (U*P T (j)UZ T (j_ 1 ), Z T (j_!)) | — | (DZ t ^, DZ T (j_xj) + (DZ T (j_x' ) ,DZ T (j_i' ) 
— II^tW-dII (||P^r(j)ll \\PZ T (j-i) ||) 

— K max WTDZtU-vWs, (\\TDZ rU) \\ e2 + ||x , z?^ r (j—i) H^a) • 

Using \\TDZ TU) \\ e2 < a T ^)\\TDZ T ^_ 1} ||^ 2 , we obtain, 

2 


Therefore, 


| (U U T (j,') U Z T (j — 1) , Z T (j‘_ 1)) U ^max (^r(j) *f f) ^ 

llPj llf 2 — \J^T(j) s max ( a r(j) T 1) ^ • 


Recall that for A; = 1,..., r, q 3 k = qk /4 for j = 1, 2 and q J k = qk for all j > 3. Let K = max£ =1 {q ^ 1 } 
and for j > 3, first note that (1 — q\) ■■■(! — q k ) = 1 — qk implies that q\ + ■ • • + q k > qk- So, we 
have that 2(/i — 2)<p > qk since q/. = q/. = qk /4. By our choice of p, this implies that 


-l 


0wM||£>£>1 /oo _, /oo TlogP" 1 ) 


-2 ) qk> qk, 
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and for j > 3. 


K. 


-O') < 2 (8 [3log ( 8 g- 1 v ^wM||^l/c=_/=o) + log(7 _1 )] - 2 ) K. 


So, ||iUj|| <2 are bounded as follows. 


I^lllf 2 — 2 V^/iKmax ^/l + 2£l/2 ’ 11 W7 2 11 ^2 < 2 \/K K mayi \j \ + \f 2C 1 / 2 ’ 


1 


and for j > 3, 


IKIU < 


2 V 3 Vj\K max y 2j _ 1 1 £1/2 "Y^8~ 3log +log(7" 1 ) - 2 ). 


Summing these terms yields 


Ml * 2 $ 


\ 


A' K n 


'log ^g-V/tmaxMlI 


DD* 




■ log(6/e) 


log(4g-V«maxM||D£>*|| <00 ^ 00 ) 


To show that conditions (iii) to (v) of Proposition [ 6 T] are satisfied with probability exceeding 1 — 57 = 
1 — 5e/6 under Assumption 7.2 we will show that P(A£) < 7 for i = 1,2 and P (B}) < 7 for j = 1 , 2, 3. 

Proof of P(£?g) < 7 Define the random variables Xj, ■ ■ ■ , X ;i _ 2 by 


*,' = 


0 0 j +2 7 ^ ©j + lj 

1 otherwise. 


Observe that 


p(b 3 c ) = Pde^i < 1 /) = ppo + • • • + x „_ 2 >n-u). 


Although [Xj jjZ-f are not independent random variables, from [21 Eqn. (7.80) - (7.85)] the above 
probability can be controlled by independent binary random variables and the standard Chernoff bound, 
so that P(Bg) < 7 provided that 


±>F(Xj = l\X h =... = Xi k =l) 


(9.3) 


for all j = 1,..., p, — 2 and Zi,..., h~ € {1,..., p — 2} such that j £ {Zi,..., 4} and t p > 8 [3^ + 
log( 7 -1 / 2 )]. It remains to verify that (9.3 1 holds with p = 1/4. Observe that Xj = 0 whenever 

\\TD(Q W - QwV*UjVQ w )Z i _ 1 \\ p < ^TDZ^Wp 


and 


PZDQhV'UjVQwZi -1 


< j\\ TDZ i-i\\p 


for i = j + 2. Thus, (9.3) holds with p = 1/4 if 


P (\\TD{Q W - QwV^jVQ^Z^Wp > i|| TDZ^l 


1 

< 


and 


PiDQhVUjVQwZi- 1|| /00 > ^\\TDZi_ x \\ p ) < f 


(9.4) 

(9.5) 


By Proposition 8.4 (9.41 is implied by (Cl) below, and by Proposition 8.5 (9.5) is implied by (C2) 
below. 
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(Cl) Let 



||tp(q w c*p [m] rq w - Q W )D*T- 1 ||^ 2 ^ 2 < 

(9.6) 


r 

Qk> VrBYvh, m(M)«z, k = l,...,r, 

(9.7) 

and 




1 ~ rB 2 Y(Y - 1 )/ i N,M(fc-i)«t 1 3 = !. 

fc=l 

(9.8) 

(C2) Let 

||P^ J DQ^WP [M] RQ w ^T- 1 || f2 ^ oo < 

(9.9) 


log (32Af) r 

9fc ^ r 5I^n,m(M) K h k = 1,..., r, 

(9.10) 

and 

rP 2 log (32M) r 

£2 J — 1,..., r. 

(9.11) 


fc =1 


It remains to show that (Cl) a nd ( C2) are implied by Assumption 7.2 First, (|9.6|) and (9.91 are implied 

It- 1 ! 


by (a) and (b) of Assumption 
show that (d) of Assumption 


7.2 


.2 


respectively because 


and ||T|| < l/^/n m \ n 
implies conditions (9.8) and (9.11). Since (1 — q\) - - ■ n — ntl 




We now 
(!-<) = 1 -Qk 


implies that + ■ ■ ■ + q k > q k , by our choice of q l k = q k = q k j 4 and q k = q k for j > 3, it follows 
that 2(/x — 2 )q k > q k . From (d) of Assumption 7.2 we have that for some appropriate constant C, 
Qk > rB( log(e _1 ) + 1 ) log (My/K max ) qk such that {q k Y k=1 satisfies, 


1 £ 51 (% 1 “ l ) &,w{ k ij) k k, j = l,...,r. 

k= l 


So, 

2(8(3 log( 8 g~ 1 M^/ K max 11 DD* \| ;oo ) + log( 7 -1 )] - 2 )q k 
>Qk> (log(e _1 ) + l)log( 8 Mg _ V/c ma x||I>I>l / =o_^ / -)gfc. 
Since 7 = e/ 6 , it follows that q k Y Qk- Thus, it follows that given any j = 1,..., r, 


k =1 
r 

Y YMk 1 — 1 ) 

k —1 


as required. 

Finally, we show that the remaining conditions (9.7) and (9.10) are implied by (c) of Assumption [A2j 
Recall that (c) imposes that for some appropriate constant C and each k = 1 ,,r 


qk > C (log(e 1 ) + l)log(8Mg 1 5Z ^n,m(^ 0 K k- 

1=1 


Since 

2(8[31og(8g _1 Afybc max ||DP*||^ 00 _^ 00 ) + log( 7 ^ 1 )] -2 )q k > q k , 

it follows that 

r 

Qk <1 5Z ^N,M( fc > 0 K Z) 

1=1 

as required. 
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Proof of P(A£) < 7 for i = 1,2 Recall from (a) of Assumption 7.2 that 

yj ^min 


TDQ w V*P [ i I] VQ w D i 


< 


■>l 2 iyfCh 


which implies (8.1) of Proposition 8.4 with a = a±. So, by Proposition 8.4 

P(A ? )<7 

provided that 

r 

q k > C 1/2 log(3/ 7 ) VfB 

and 


z=i 


1 > C log(3 h)rB 2 J2(q k 1 - 1) Mn,m(*».?) «k> 3 = h 


k=1 


These two conditions are implied by (c) and (d) of Assumption 7.2 


Proof of P (Bf) < 7 for i = 1,2 Recall from (b) of Assumption 7.2 that 

1 


DQ^V*P [m] VQ w D* < 


8 -\/ ^’ri 


P(^)< 7 


which implies ( 8 . 6 ) with a = So, by Proposition 8.5 

provided that 

' A M\ 


and 


l>VrBlog\^-J Y^iqk 1 - l) ^h M {k,j) K k , j = 

Qk > rB 2 log k = l,...,r,. 


These two conditions are implied by (c) and (d) of Assumption 7.2 


10 Properties of the subsampled matrix 


In this section we show that conditions (i) and (ii) of Proposition 6.1 are satisfied with probability 


exceeding 1 — e /6 under Assumption 7.2 


are 


< - 
I H-yH 4’ 


Recall that conditions (i) and (ii) of Proposition 6.1 

(i) \\QwV*(qi 1 Pn 1 ® • ■ ■ © Qr lp ^vWQw — Qw 

(h) sup ieN \\P {j} DQ^ v V*(q^ 1 n 1 © • ■ ■ © q- 1 Q, r )VQ^D*P {j} 

It is sufficient to show that 

\QwV*(Qi 1 Pq 1 ® ■ • • ® qr 1 Pn r .)VQw - Qvv 


< — 

' 4- 


and 


SU P \\ p {j} D Qw v *(Qi fi i' 

tew 


> 7 ) < £ / 12 ' 
5' 


>q^n r )VQh D *P{ j} ||, 2 ^ 2 > 7 ) < e/12. 


The fact that (10. 1|) holds under Assumption 7.2 follows from Proposition 8.6 


( 10 . 1 ) 

( 10 . 2 ) 


To see that Propo sitio n | 8.7 | i mplies that (|10.2|) under Assumption 7.2 first note that by our choice 
of M and Proposition 


8.7 


( 10 . 2 ) follows if for each k,j = 1 ,..., r, 


1 > (q k ~~ 1 ) B 2 /jn,m(^i j) log 


12M 


(10.3) 


which is implied by (d) of Assumption 7.2 
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11 Concluding remarks 

Recent works Eam have identified the need for further theoretical development on the use of variable 
density sampling in compressed sensing. Furthermore, variable density sampling schemes are dependent 
not only on sparsity but also the sparsity structure of the underlying signal. To address this, j2j showed 
that in the case of where the sparsifying operator is associated with an orthonormal basis, by considering 
levels of the sampling and sparsifying operators, the amount of subsampling possible can be described 
in terms of the local coherences between the different sections and the sparsity of the underlying signal 
within each level. This paper presented an extension of this result to the case where the sparsifying 
operator is constructed from a tight frame. By defining the notions of localized sparsity and localized level 
sparsities, we derived a recovery guarantee for multilevel sampling patterns based on local coherences 
and localized level sparsities. One direction of future work would be to apply our abstract result to 
analyse the use of multilevel sampling schemes in the case of Fourier sampling with some multi-scale 
analysis operator such as wavelet frames and shearlets. By deriving estimates on the local coherences of 
such operators, one can expect to obtain a better understanding on how to exploit sparsity structure to 
subsample. Finally, although this paper considered only the case of a tight frame regularize!', this does 
not seem to be necessary in practice and it is likely that that similar estimates to Theorem |4.4| can be 
derived by considering the canonical dual operator of D. 
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A The discrete Haar wavelet frame 


The discrete Haar frame of redundancy two is defined as follows. Let N = 2 P for some p £ N and {co} U 
{hkj : k = 0,... ,p — l,j = 1,..., 2 p } be the discrete Haar basis for C N . Specifically, c 0 = 2 _p / 2 (l,..., 1) 
and for l = 0,... ,p — 1 and k = 1,..., 2 fe , 


f 2( z " p )/ 2 j = k2 p ~ l + 1,..., k2 p ~ l + 2 p- 1 - 1 

iMJi - j _ 2 (*-p)/ 2 j = k2 P~ l + 2 p- 1 - 1 + 1,..., k2P~ l + 2 p~ 1 ’ 


1 < j < 2 P . 


Let Cq = Cq, and for each k = 0,... ,p — 1, j = 1,..., 2 k , let 


hk j[n\ = <J K ’ 3 

JL \h kJ [N\ 


h k ,j[n-l\ n = 2,..., JV 
n = 1 


The two discrete Haar wavelet frame of redundancy two is defined by 

{2- 1 / 2 c 0 } U {2 ~ 1/2 h kj : k = 0,... ,p — 1, j = 1,..., 2 fe } 

U {2- 1 / 2 c 0 } U {2 ~ 1/2 h k ,j : k = 0,... ,p - 1, j = 1,..., 2 fc } . 

For analysis purposes, we will order these frame elements in increasing order of scaling with 


= I ^ 1 ^ 2 co,2 1 ^ 2 co> 2 1//2 /io,i,2 1//2 ft-o,i) • • •, 2 1 / 2 hkjJikj,2 1 ^ 2 hk,j+ 1 ,2 1 /‘ 2 h k ,j+ 1,- 


...,2 1 ^ 2 h k + i : j 1 2 1 / 2 h k + ij ,...,2 1 ^ 2 ^ p - i i at , 2 1 ^ 2 h p _ 


and let Dx = ((x,ipk))k=i- Note that D*D = I. 

The following lemma shows that \\DD*\\ eoo _^ eoo can be upper bounded independently of N. 
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B Existence of minimizers 

Proposition B.l. Let Q C N be finite and let y € t? 2 (N). There exists g* € TL such that 

g* G argmin ||D 5|| £1 subject to \\PnVg - y\\ p < S. 
gau 

Proof. Let (/ n ) n gN CM be a minimizing sequence such that \\PqV f n — y\\ (2 < 5 for each n G N and 

\\Dfn\\ e i inf \\Dg\\ (1 subject to \\PnVg - y\\ < 5, n -> oo. 
g&H 

This implies that (Df n ) ne ^ is a bounded sequence in £ 1 (N). Since the dual of co(N) (the Banach space of 
sequences converging to zero) is t' 1 (N), and the unit ball of i 1 is weak-* compact, there exists x G t' 1 (N) 
and a subsequence {Df nk ) ke ^ such that Df nk x as k — > oo, and for each z G Co(N), ( Df nk , z) —> (x, z) 
as k —> oo. 

Since DD* G Z3(^°°(N), £°°(N)), it follows that given any z G co(N), DD*z G Co(N). To see this, note 
that given any 5 > 0, we can choose Ad G N such that P^z ^ < 5/{2\\DD*\\ f i 1 ^ f p) for all N > Ad- 

Furthermore, for this choice of Ad, we can choose N 2 such that ||P[Ar 1 ]DI?*efc||^ 1 < <5/(2||2:||^ 00 ) for all 
k > N 2 . Thus, for all k > max {Ad, Ad}, 

\(D*Dz,e k )\ < \(D*P [Nl] Dz,e k )\ + |(0*P [ ^ i] Dz,e fe ) 

< 11-0-0*11^1^1 + \\ Z \\ eao \\P [Nl ]DD*e k \\ ex < S. 
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Therefore, D*Dz € Cq(N), ( Df nkl DD*z) —► (x,DD*z) as k —» oo and consequently, DD*Df nk = 
Dfn k DD*x. This implies that, 


liminf \\Df nk \\ gl > \\DD*x\\ el . 

k—>oo 

Furthermore, because Df nk converges weakly to x in t? 2 (N) and PqVD* is a compact operator (since 
it is of finite rank), PqV f nk = P n VD*Df nk -> PqVD*x as k oo. So, \\PqVD*x - y\\ p < 6. Thus, 
g* := D*x is a nrinimizer. 

□ 
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